Methods and compositions for prognostic and/or diagnostic subtyping of pancreatic cancer

ABSTRACT

Methods for generating a prognostic and/or subtype signature for a subject with pancreatic ductal adenocarcinoma (PDAC) are provided. In some embodiments, the methods include determining expression levels for one or more genes listed in Tables 2-5, 9, 10, or 11, and/or the DE-S and/or DE-T subset of genes in PDAC cells obtained from the subject, wherein the determining provides a prognostic and/or subtype signature for the subject. Also provided are methods for classifying a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC) as having an activated stroma subtype or a normal stroma subtype of PDAC and/or a basal subtype or a classical subtype of PDAC; and methods for identifying a differential treatment strategy for a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC).

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 15/518,900, filed Apr. 13, 2017 (pending), which itself is a United States National Stage Application filed under 35 U.S.C. § 371 of PCT International Patent Application Serial No. PCT/2015/055565, filed Oct. 14, 2015, which itself is based on and claims priority to U.S. Provisional Patent Application Ser. No. 62/201,793, filed Aug. 6, 2015 and U.S. Provisional Patent Application Ser. No. 62/063,719, filed Oct. 14, 2014. The disclosure of each of these applications is incorporated by reference herein in its entirety.

GOVERNMENT INTEREST

This invention was made with United States government support under Grant Nos. CA009156 and CA014024 awarded by National Institutes of Health of the United States. The United States government has certain rights in the invention.

REFERENCE TO SEQUENCE LISTING

The Sequence Listing associated with the instant disclosure has been electronically submitted to the United States Patent and Trademark Office as the Receiving Office as a 521 kilobyte ASCII text file created on May 26, 2021 and entitled “421_357_2_PCT_US_CON_ST25.txt”. The Sequence Listing submitted via EFS-Web is hereby incorporated by reference in its entirety.

TECHNICAL FIELD

The presently disclosed subject matter relates to compositions and methods for producing gene expression profiles for subjects that have or are suspected of having pancreatic cancer and employing the same to identify appropriate treatment approaches.

BACKGROUND

Pancreatic ductal adenocarcinoma (PDAC), comprising over 90% of all pancreatic cancers, remains a lethal disease with an estimated 232,000 new cases and an estimated 227,000 deaths per year worldwide in 2008 (Parkin et al., 2002; Boyle & Levin, 2008). Incremental improvements in the treatment of this cancer have been made in the last two decades, but the estimated five-year survival worldwide remains at less than 5% (Boyle & Levin, 2008).

Currently, the standard of care for the 20% of patients who are diagnosed with localized disease is surgery followed by chemotherapy with gemcitabine. Unfortunately, despite the use of adjuvant therapy, median survival remains at less than two years (Neuhaus et al., 2008), with only 12% of patients undergoing curative surgery surviving more than five years (Conlon et al., 1996; Ahmad et al., 2001; Cleary et al., 2004; Han et al., 2006; Winter et al., 2006; Ferrone et al., 2008; Schnelldorfer et al., 2008).

PDAC is thus characterized by a lack of effective targeted therapies, clinically useful biomarkers, and consensus subtypes. Therefore, understanding molecular mechanisms of disease underlying PDAC has the potential to facilitate the development of rationally designed therapies, and could assist in tailoring the use of the same to individual patients. Interestingly, in large retrospective studies examining actual long-term (five- and ten-year) survivors (Conlon et al., 1996; Ahmad et al., 2001; Cleary et al., 2004; Han et al., 2006; Winter et al., 2006; Ferrone et al., 2008; Schnelldorfer et al., 2008), only two studies (Ahmad et al., 2001; Winter et al., 2006) have found that adjuvant therapy was associated with improved survival, suggesting that the benefits of adjuvant therapy are still controversial. In addition, gene sequencing of rare long-term survivors suggests that gene mutations in those tumors are no different than PDAC patients with more aggressive disease. One possible conclusion from these studies is that tumor biology in PDAC is more complex than gene mutations. Unfortunately, previous work using gene expression has been hampered by the low cellularity of malignant epithelium in PDAC patient samples. The low cellularity of PDAC poses a diagnostic dilemma as well in that biopsies of the tumor many times is non-diagnostic.

Despite these difficulties, defining subtypes of PDAC that would dictate the type whether it be tumor extirpation, chemotherapy or molecular and immunotherapy and timing of those therapies for patients would be beneficial. For PDAC in particular, better diagnostic tests independent of tumor cellularity would be beneficial. Achieving these goals is the ultimate goal of precision medicine.

SUMMARY

This Summary lists several embodiments of the presently disclosed subject matter, and in many cases lists variations and permutations of these embodiments. This Summary is merely exemplary of the numerous and varied embodiments. Mention of one or more representative features of a given embodiment is likewise exemplary. Such an embodiment can typically exist with or without the feature(s) mentioned; likewise, those features can be applied to other embodiments of the presently disclosed subject matter, whether listed in this Summary or not. To avoid excessive repetition, this Summary does not list or suggest all possible combinations of such features.

In some embodiments, the presently disclosed subject matter provides methods for generating a prognostic and/or subtype signature for a subject with pancreatic ductal adenocarcinoma (PDAC). In some embodiments, the methods comprise determining expression levels for one or more genes selected from the group consisting of those genes listed in Tables 2-5 in PDAC cells obtained from the subject, wherein the determining provides a prognostic and/or subtype signature for the subject. In some embodiments, the methods comprise determining expression levels for one or more genes listed in Table 1 as corresponding to the DE-S or DE-T subset in PDAC cells obtained from the subject, wherein the determining provides a prognostic and/or subtype signature and/or subtype identification that can be a diagnostic, prognostic, and/or treatment-determinative call for the subject. In some embodiments, the methods comprise determining expression levels for all of the genes listed in Tables 2-5 and/or for all of the genes listed in Table 1 as corresponding to the DE-S or DE-T subset in PDAC cells obtained from the subject.

In some embodiments, the methods further comprise comparing a first prognostic and/or subtype signature determined for the genes in Table 2 to a second prognostic and/or subtype signature for the genes in Table 3, wherein the comparing classifies the subject as having a PDAC subtype that is associated with either normal or activated stroma.

In some embodiments, the methods further comprise comparing a first prognostic and/or subtype signature determined for the genes in Table 4 to a second prognostic and/or subtype signature for the genes in Table 5, wherein the comparing classifies the subject as having a PDAC subtype that is a classical subtype or a basal subtype.

The presently disclosed subject matter also provides methods for classifying a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC) as having an activated stroma subtype or a normal stroma subtype of PDAC. In some embodiments, the methods comprise (a) determining expression levels of the genes listed in Table 2 or an informative subset thereof and in Table 3 or an informative subset thereof in a biological sample comprising PDAC cells obtained from the PDAC of the subject; (b) creating an expression profile, wherein the expression profile encompasses expression levels of the genes listed in Table 23 or the informative subset thereof and the genes listed in Table 3 or the informative subset thereof; and (c) using the expression profiles created in the form of analysis of top scoring pairs of genes, wherein the analysis employs a trained logistic model in which binary input from discriminatory gene pairs are input and classification odds results are produced, whereby the subject is classified as having an activated stroma subtype or a normal stroma subtype of PDAC. In some embodiments, the method comprises comparing the expression profiles created to a standard, wherein the comparing employs a Bayesian classification reflecting a distance from (1) an activated stroma centroid that is high magnitude for all activated stroma genes and low magnitude for all normal stroma discriminatory genes; and (2) a normal stroma centroid that is high magnitude for all normal stroma genes and low magnitude for all activated stroma discriminatory genes. In some embodiments, the comparing determines whether the expression profile is closer to the activated stroma centroid or the normal stroma centroid, whereby the subject is classified as having an activated stroma subtype or a normal stroma subtype of PDAC. In some embodiments, the expression profiles comprise expression levels for each of the genes listed in Table 10, and the using comprises calculating a value d using EQUATION 2,

$\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Activated}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{11mu} d} > 0} \\ {{{Normal}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 2} \end{matrix}$

wherein A_(i) and B_(i) are measured expression levels of each Gene A and each Gene B of Table 10 in the i^(th) row, respectively, C_(i) is the i^(th) coefficient, and I is the intercept, and further wherein if d is greater than 0, the subject is classified as having an activated stroma subtype, and if d is less than or equal to 0, the subject is classified as having a normal stroma subtype of PDAC.

The presently disclosed subject matter also provides methods for classifying a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC) as having a basal subtype or a classical subtype of PDAC. In some embodiments, the methods comprise (a) determining expression levels of the genes listed in Table 4 or an informative subset thereof and in Table 5 or an informative subset thereof in a biological sample comprising PDAC cells obtained from the PDAC of the subject; (b) creating an expression profile, wherein the expression profile encompasses expression levels of the genes listed in Table 4 or the informative subset thereof and the genes listed in Table 5 or the informative subset thereof; and (c) using the expression profiles created in the form of analysis of top scoring pairs of genes, wherein the analysis is composed of a trained logistic model in which binary input from discriminatory gene pairs are input and classification odds results are produced, whereby the subject is classified as having a basal subtype or a classical subtype of PDAC. In some embodiments, the method comprises (c) comparing the expression profiles created to a standard, wherein the comparing employs a Bayesian classification reflecting a distance from (1) a basal centroid that is high magnitude for all basal genes and low magnitude for all classical discriminatory genes; and (2) a classical centroid that is high magnitude for all classical genes and low magnitude for all basal discriminatory genes. In some embodiments, the comparing determines whether the expression profile is closer to the basal centroid or the classical centroid, whereby the subject is classified as having a basal subtype or a classical subtype of PDAC. In some embodiments, the expression profiles comprise expression levels for each of the genes listed in Table 11, and the using comprises calculating a value d using EQUATION 3,

$\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Basal} - {{like}\mspace{14mu}{if}\mspace{14mu} d}} > 0} \\ {{{Classical}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 3} \end{matrix}$

wherein A_(i) and B_(i) are measured expression levels of each Gene A and each Gene B of Table 11 in the i^(th) row, respectively, C_(i) is the i^(th) coefficient, and I is the intercept, and further wherein if d is greater than 0, the subject is classified as having a basal-like subtype, and if d is less than or equal to 0, the subject is classified as having a classical subtype of PDAC.

In some embodiments, the presently disclosed subject matter also provides methods for identifying a differential treatment strategy for a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC) and/or for diagnosing PDAC on low cellularity biopsies. In some embodiments, the methods comprise (a) determining the expression levels of the genes listed in Tables 2-5 in a biological sample comprising PDAC cells obtained from the PDAC of the subject; (b) creating an expression profile for the subject based on the expression levels of the genes listed in Tables 2-5; (c) classifying the subject as having an activated stroma subtype or a normal stroma subtype of PDAC, a basal subtype or a classical subtype of PDAC, or both; and (d) selecting a treatment strategy for the subject based on the classification of the subject as having an activated stroma subtype or a normal stroma subtype of PDAC, a basal subtype or a classical subtype of PDAC, an activated stroma/basal subtype of PDAC, a normal stroma/basal subtype of PDAC, an activated stroma/classical subtype of PDAC, or a normal stroma/classical subtype of PDAC, wherein a differential treatment strategy for the subject is identified. In some embodiments, the method further comprises (e) diagnosing PDAC on a patient with inadequate tumor cells by classifying the subject as having an activated stroma subtype or a normal stroma subtype of PDAC.

In some embodiments of the instantly disclosed methods where the genes to be assayed are those set forth in Tables 2-5, the genes referred to herein as DE-S and/or DE-T can be employed rather than those in Tables 2-5.

In some embodiments of the presently disclosed methods, the subject is a human.

It is thus an object of the presently disclosed subject matter to provide methods for predicting outcomes of subjects with pancreatic cancer.

An object of the presently disclosed subject matter having been stated hereinabove, and which is achieved in whole or in part by the presently disclosed subject matter, other objects will become evident as the description proceeds when taken in connection with the accompanying Figures as best described herein below.

BRIEF DESCRIPTION OF THE FIGURES

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.

FIGS. 1A-1D are representative hematoxylin and eosin (H&E) staining of patient tumor samples. FIG. 1A depicts liver metastases showing regions of tumor and normal tissue. FIG. 1B depicts a primary pancreatic tumor sample showing normal pancreatic tissue and tumor cells in the same field. FIG. 1C depicts a primary pancreatic tumor with high tumor cellularity. FIG. 1D depicts a primary pancreatic tumor with abundant tumor stroma. Black arrowheads show areas of tumor stroma. Black arrows show areas of tumor. White arrowheads show normal tissue. Scale bars, 200 μm.

FIG. 2 depicts the percentage of tumor in primary pancreatic tumors in the UNC and International Cancer Genome Consortium (ICGC) cohorts.

FIGS. 3A-3D depict the results of Successful Deconvolution of Normal Tissue with NMF. FIG. 3A is a cartoon depicting the major cell types in primary tumor and liver metastasis samples. FIG. 3B (above) is an overlap of sample types (solid colors) with factor weights (grayscale heat maps), and (below) heat maps of five exemplar genes for all tumors and adjacent normal tissues. Gene expression shown in the heat map has been Z-normalized. FIG. 3C is a series of Box and Whiskers plots comparing NMF factor weights across tissue types and corresponding t-test result. FIG. 3D is a series of plots showing percent tumor cellularity versus NMF liver factor weight, and NMF basal tumor factor weight for metastases to the liver and adjacent liver samples. Linear regression lines are shown in red along with corresponding statistics.

FIGS. 4A-4D depict the results of a series of experiments that demonstrated that a dual action of stroma is described by distinct gene expression patterns which are not expressed in cell lines. FIG. 4A is a consensus clustered heat map of UNC primary tumor samples, metastases, and cell lines using genes from stromal factors. Samples clustered into 3 groups, describing samples with activated stroma, normal stroma, and samples with low or absent stromal gene expression. FIG. 4B is a Kaplan-Meier survival analysis of resected PDAC patients from the activated and normal stromal clusters shows that samples in the activated stroma group have worse prognosis, with a hazard ratio of 1.94 (CI=[1.11, 3.37], p=0.019). FIG. 4C shows gene expressions of various stromal signatures were overexpressed in cancer associated fibroblasts (CAFs) as compared to tumor cell lines. FIG. 4D is a series of plots showing that genes from both stromal signatures were specifically overexpressed by the mouse stroma in PDX tumors, and not expressed by the human tumor cells.

FIG. 5 depicts deconvolution of a large cohort of PDAC revealed distinct gene expression patterns from multiple tissue types. Solid color bars above the heat map show the tissue of origin and tumor status of the samples, which were used to order the samples horizontally. Factor weights derived by NMF for selected factors are shown as grayscale bars. Heat maps show Z-normalized gene expression of five exemplar genes from each factor. All tumors, cell lines, and adjacent normal tissues from the present cohort are shown.

FIG. 6 depicts a correlation of pathology assessments of tumor with factor weights in normal pancreas and primary tumors. Horizontal axes all show tumor cellularity, while vertical axes show factor weight. Red dashed lines show best linear fits. p values are given for each R².

FIGS. 7A-7H depict the results of a series of experiments that showed that tumor specific gene expression suggested two subtypes of PDAC with similarities to other tumor types. FIG. 7A is a consensus clustered heat map of primary tumors, metastatic tumors, and cell line models of PDAC using correlation as the underlying distance function shows two subtypes of PDAC FIG. 7B is a Kaplan-Meier survival analysis of resected primary patients from each tumor subtype (36 basal-like, 89 classical) in FIG. 7A shows differential prognosis among subtypes with a hazard ratio of 1.89, and a 95% CI of [1.19, 3.02]. FIG. 7C is a consensus clustered heat map of tumors in the ICGC PDAC cohort split by basal and classical factor gene expression into basal-like (n=56) and classical (n=47) tumors. FIG. 7D is a plot showing that basal-like tumors in the ICGC data set had a hazard ratio of 2.11, with a 95% CI of [1.14, 3.89]. Median follow up was 20 months. FIG. 7E is a consensus clustered heat map of The Cancer Genome Atlas (TCGA) Bladder cancer (BLCA) samples split by basal and classical factor gene expression into basal-like (n=128) and classical-like (n=95) tumors strongly agrees with BASE47 basal calls shown above the heat map. FIG. 7F shows subtyping in the TCGA BLCA data set had a hazard ratio of 1.43, with a 95% CI of [0.84, 2.42] FIG. 7G is a consensus clustered heat map of the Perou breast cancer data set as split by basal factor genes (n=72 basal-like, n=223 not basal) strongly agrees with the division of samples into previously published basal and non-basal subtypes. FIG. 7H shows that basal-like breast cancer, as defined by the presently disclosed subject matter, had a hazard ratio of 3.52, with a 95% CI of [1.94, 6.38].

FIGS. 8A-8F are a series of immunofluorescence images of Cancer Associated Fibroblasts (CAFs). Staining using antibodies against EpCAM (FIG. 8A), vimentin (FIG. 8B), and SMAα (FIG. 8C). FIG. 8D shows staining of T3M4 cells as a positive control for EpCAM. FIG. 8E shows staining of T3M4 cells as a negative control for vimentin, and FIG. 8F shows staining of T3M4 cells as a negative control for SMAα. Scale bars are 50 μm.

FIG. 9 is a hierarchical clustering of Spearman correlation of samples from UNC, TCGA Bladder, and Perou data sets showing similarities among basal-like subtype samples. Color bars above the heat map show subtype, either from original publication (Known Tumor Subtype), or from the cross-platform classifier (Pan-platform classification).

FIGS. 10A-10C depict comparisons to the subtypes disclosed in Collisson et al., 2011. FIG. 10A is a consensus clustered heat map of normalized data from UNC and Collisson et al. using Collisson et al.'s gene sets. Primary tumors, normal pancreas, and cell lines are shown. Collisson samples were previously classified as exocrine-like (magenta or black), classical (cyan or dark grey), and quasimesenchymal (yellow or light gray). FIG. 10B is a Kaplan-Meier plots of UNC samples classified by PAM into Collisson et al.'s subtypes. FIG. 10C is a series of plots of mouse and human specific gene expression of the Collisson et al. gene lists in PDX shown in log₂(1+RPKM). Classical genes are expressed by tumor cells, quasimesenchymal genes are expressed by a mix of human and mouse, while exocrine-like genes are lowly expressed throughout.

FIGS. 11A-11E depict the results of multivariate survival analysis of tumor and stromal subtypes. FIG. 11A is a heat map of tumor samples using 25 genes from each of the tumor and stromal factors, with samples sorted horizontally by classification. Signature scores for selected gene sets appear above for each sample. FIG. 11B is a combined Kaplan-Meier survival analysis of resected primary patients from basal-like or classical tumor types and normal or activated stroma subtypes with differential survival (p<0.001 log-rank test). Differential prognosis among subtypes shows complementarity. Classical tumors with normal stroma subtypes (n=24) had the lowest hazard ratio of 0.39, and a 95% CI of [0.21, 0.73], while basal-like tumors with activated stroma subtypes (n=26) had the highest hazard ratio of 2.28 with a 95% CI of [1.34, 3.87]. FIG. 11C is a Kaplan-Meir survival analysis showing that patients with classical subtype tumors show less response to adjuvant therapy (HR=0.76, 95% CI [0.40, 1.43]) compared to FIG. 11D is a plot showing basal-like tumors (HR of 0.38, and a 95% CI of [0.14, 1.09]). FIG. 11E is a Kaplan-Meir survival analysis showing that African-Americans have worse overall survival in both basal-like and classical subtypes, with a Hazard ratio of 2.28 and a 95% CI of [1.16,4.5].

FIGS. 12A-12D are a series of immunohistochemical panels of Collagen I staining to define mouse stroma in PDX. FIG. 12A shows anti-mouse Collagen I staining of stroma in a representative PDX tumor. FIG. 12B is a corresponding H&E stain of the section adjacent to that shown in FIG. 12A. Anti-mouse Collagen I staining of mouse skin (FIG. 12C) and human skin (FIG. 12D) are also depicted. Black arrowheads show areas of tumor stroma. Black arrows show areas of tumor. Scale bars, 200 μm.

FIGS. 13A and 13B depict the results of tumor gene expression in PDX models. FIG. 13A is a series of plots of mouse and human specific gene expression of basal-like and classical subtype gene lists in 37 PDX tumors shown in log₂(1+RPKM). Both gene sets were robustly expressed by the human (tumor) but not the mouse (stroma) cells in PDX samples. FIG. 13B is a consensus clustering of these PDX tumors using basal-like and classical gene lists divides samples into 2 groups.

FIGS. 14A-14I depict associations between tumor and stroma subtypes, PDX tumors, KRAS mutations, and SMAD4 expression. FIG. 14A is a series of pie charts showing that tumor subtype was not associated with PDX graft success rate (p=0.417). FIG. 14B is a series of pie charts showing that activated stromal subtype samples engrafted with higher success rates than low or normal stromal subtype samples (p=0.019) FIG. 14C is a plot showing that basal-like tumor subtype PDX reached 200 mm³ faster than classical subtype PDX (p=0.032). FIG. 14D is a plot showing that PDX from samples with activated stroma subtype or normal stroma subtype did not have significantly different times to reach 200 mm³ (p=0.170). FIG. 14E is a plot showing that PDX tumors with faster growth rates were associated with earlier recurrences in patients (HR=0.31, 95% CI [0.10, 0.92]. FIG. 14F is a series of pie charts showing that KRAS mutation type was not uniformly distributed among race or subtype. KRAS G12D mutations were more prevalent in basal-like subtype tumors than classical tumors (p=0.030). FIG. 14G is a series of pie charts showing that African Americans had more G12V mutations, while Caucasians had more G12D mutations (p<0.001). FIG. 14H is a plot showing that SMAD4 staining in primary tumors was predictive of successful PDX engraftment (p=0.044). FIG. 14I is a plot showing that basal-like subtype PDX exhibited weaker SMAD4 staining than classical subtype PDX (p=0.015).

FIGS. 15A-15G are a series of immunohistochemical panels showing SMAD4 staining of representative patient and matched PDX tumors. Positive SMAD4 staining of a patient adenocarcinoma is shown in FIG. 15A, and the corresponding PDX at passage 4 is shown in FIG. 15B. SMAD4 loss in a patient adenocarcinoma is shown in FIG. 15C and corresponding PDX at passage 2 is shown in FIG. 15D. SMAD4 staining of control human skin is shown in FIG. 15E, and is shown in mouse skin in FIG. 15F and in human normal pancreas in FIG. 15G. Scale bars are 200 μm.

FIG. 16 depicts a consensus clustered heat map of ICGC data for which genetic information was available. Color bars above the heat map show subtypes and genetic alterations for key genes in PDAC. Heat maps show Z-normalized gene expression of basal-like and classical tumor genes.

FIGS. 17A-17C are a series of plots showing Gene signature scores by subtype normalized across the cohort, and calculated as the mean expression across a panel of genes obtained from MsigDB. FIG. 17A shows that the basal-like subtype showed downregulation of GATA6. FIG. 17B shows that the classical subtype tumors were enriched in genes associated with mucinous ovarian cancer. FIG. 17C shows that basal-like subtype tumors were enriched in genes related to KRAS activation and STK11 loss.

FIGS. 18A-18C depict differences in extracellular mucin in classical and basal-like subtype tumors. FIG. 18A is a series of pie charts showing that number of samples with low (<10%) compared to high (≥10%) extracelluar mucin content. Representative H&E stains of a sample with low degree (FIG. 18B) and high degree (FIG. 18C) of extracellular mucin content are also depicted. Scale bars are 200 μm.

FIGS. 19A-19G depict the results of experiments showing that overcoming tumor cellularity revealed true heterogeneity among matched primary and metastatic sites. FIG. 19A shows that sample-sample correlations of matched primary and metastatic tumors using the 50 most differentially expressed genes across all samples (“DE50”) caused samples to group by organ location. FIG. 19B shows that sample-sample correlations using 25 genes each from classical and basal-like tumor lists (“T50”) caused samples to cluster instead by tumor subtype and patient of origin. FIG. 19C is a plot showing that the correlation of samples within the same patient was higher when using T50 genes than when using DE50 genes. FIG. 19D is a plot showing that correlation of samples originating in the same organ was higher when using DE50 than when using T50. FIG. 19E shows clustering of multiple samples from two patients using the DE50 divides samples by organ. Genes expressed highly in lung and liver tissue are noted with brackets. FIG. 19F shows clustering of the same samples from (e) using T50 genes separates samples by patient. Brackets note genes which differentiate the two patients. FIG. 19G is an diagram of sampled locations for these patients indicated by concentric circles and illustrating how samples simultaneously exhibit both patient (inner color) and organ (outer color) specific gene expression.

FIG. 20 is a summary of exemplary, non-limiting treatment strategy considerations for patients with non-metastatic disease based on stromal subtype as identified using EQUATION 2 or tumor subtype as identified using EQUATION 3 below.

FIG. 21 is a summary of exemplary, non-limiting treatment strategy considerations for patients with metastatic disease based on stromal subtype as identified using EQUATION 2 or tumor subtype as identified using EQUATION 3 below.

BRIEF DESCRIPTION OF THE SEQUENCES

The biosequences summarized in Table 1 are Accession Numbers for exemplary human nucleic acid sequences that are present in the GENBANK® biosequence database, the expression of which can be assayed in the practice of the presently disclosed methods. It is noted that the GENBANK® biosequence database Accession Numbers presented in Table 1 are exemplary only and that other nucleic acids including but not limited to other transcript variants that are also listed in the GENBANK® biosequence database under the corresponding Gene Names and/or that are derived from the listed loci can be employed for the analysis of subjects. Similarly, in the event that any of the sequences set forth in Table 1 are updated in the GENBANK® biosequence database, the updated sequences are also understood to be encompassed by the presently disclosed subject matter.

TABLE 1 Listing of GENBANK ® Accession Numbers for Nucleic Acid Sequences of Exemplary Human Gene Products GENBANK ® SEQ GENBANK ® SEQ Gene Accession ID Accession ID Symbol No. No. Gene Symbol No. No. ABCA8^(N) NM_001288985.1 1 COL1A2^(A) NM_000089.3 27 ACTG2^(N) NM_001615.3 2 COL3A1^(A) NM_000090.3 28 ADAMTS1^(N) NM_006988.3 3 COL5A1^(A) NM_000093.4 29 AGR2^(C) NM_006408.3 4 COL5A2^(A) NM_000393.3 30 AGR3^(C) NM_176813.3 5 COL10A1^(A) NM_000493.3 31 ANGPTL7^(N) NM_021146.3 6 COL11A1^(A) NM_001854.3 32 ANXA8L2^(B) NM_001098845.2 7 COMP^(A) NM_000095.2 33 ANXA10^(C) NM_007193.4 8 CST6^(B) NM_001323.3 34 AREGB NM_001657.3 9 CTHRC1^(A) NM_138455.3 35 ATAD4^(C) NM_024320.3 10 CTSE^(C) NM_001910.3 36 ATP1OB NM_025153.2 11 CTSL2^(B) NM_001333.3 37 B3GNT5 NM_032047.4 12 CYP3A7C NM_000765.4 38 BCAS1 NM_003657.2 13 DCBLD2 NM_080927.3 39 BTNL8^(C) NM_024850.2 14 DDC NM_001082971.1 40 C2ORF40^(N) NM_032411.2 15 DES^(N) NM_001927.3 41 C100RF116^(N) NM_006829.2 16 DHRS9^(B) NM_199204.1 42 C16orf74 NM_206967.2 17 FABP4^(N) NM_001442.2 43 CAPN9 NM_006615.2 18 FAM3D^(C) NM_138805.2 44 CD109 NM_133493.4 19 FAM83A^(B) NM_032899.5 45 CDH11^(A) NM_001797.2 20 FAP^(A) NM_004460.3 46 CDH17^(C) NM_004063.3 21 FGFBP1^(B) NM_005130.4 47 CDH19^(N) NM_021153.3 22 FN1^(A) NM_212482.1 48 CEACAM6^(C) NM_002483.6 23 FNDC1^(A) NM_032532.2 49 CHST6 NM_021615.4 24 GPM6B^(N) NM_001001995.1 50 CLRN3^(C) NM_152311.3 25 GPR87^(B) NM_023915.3 51 COL1A1^(A) NM_000088.3 26 GPR160 NM_014373.2 52 GREM1^(A) NM_013372.6 53 MYO1A^(C) NM_001256041.1 82 HPGD NM_000860.5 54 NAB1 NM_005966.3 83 ID4^(N) NM_001546.3 55 OGN^(N) NM_033014.2 84 IGF1^(N) NM_001111283.1 56 PLA2G10^(C) NM_003561.1 85 IL20RB NM_144717.3 57 PLEKHA6 NM_014935.4 86 INHBA^(A) NM_002192.2 58 PLP1^(N) NM_000533.3 87 ITGA1l^(A) NM_001004439.1 59 PLS1 NM_001145319.1 88 KCNE3 NM_005472.4 60 POSTN^(A) NM_006475.2 89 KRT6A^(B) NM_005554.3 61 PPP1R14^(C) NM_030949.2 90 KRT6C^(B) NM_173086.4 62 PTGES NM_004878.4 91 KRT7^(B) NM_005556.3 63 PTX3^(N) NM_002852.3 92 KRT15^(B) NM_002275.3 64 RBPMS2^(N) NM_194272.1 93 KRT16 NM_005557.3 65 REG4^(C) NM_001159352.1 94 KRT17^(B) NM_000422.2 66 RERGL^(N) NM_024730.3 95 KRT20^(C) NM_019010.2 67 RSPO3^(N) NM_032784.4 96 LEMD1^(B) NM_001199050.1 68 S100A2^(B) NM_005978.3 97 LGALS4^(C) NM_006149.3 69 SCEL^(B) NM_144777.2 98 LMOD1^(N) NM_012134.2 70 SCRG1^(N) NM_007281.2 99 LOC400573^(C) BC063383 71 SERPINB3^(B) NM_006919.2 100 LPHN3^(N) NM_015236.4 72 SERPINB4^(B) NM_002974.3 101 LUM^(A) NM_002345.3 73 SERPINB5 NM_002639.4 102 LY6D^(B) NM_003695.2 74 SFRP2^(A) NM_003013.2 103 LYZ^(C) NM_000239.2 75 SLC2A1^(B) NM_006516.2 104 MEOX2^(N) NM_005924.4 76 5LC44A4 NM_025257.2 105 MET NM_001127500.1 77 SPARC^(A) NM_003118.3 106 MMP11^(A) NM_005940.3 78 SPINK4^(C) NM_014471.1 107 MS4A8B NM_031457.1 79 SPRR1B^(B) NM_003125.2 108 MSLN NM_005823.5 80 SPRR3^(B) NM_005416.2 109 MYH11^(N) NM_002474.2 81 ST6GALNAC1^(C) NM_018414.4 110 SULF1^(A) NM_001128205.1 111 TN54^(B) NM_032865.5 119 SYNM^(N) NM_145728.2 112 TSPAN8^(C) NM_001168412.1 120 SYTL2 NM_001289610.1 113 UCA1^(B) EU334869.1 121 TFF1^(C) NM_003225.2 114 VCAN^(A) NM_004385.4 122 TFF2^(C) NM_005423.4 115 VGLL1^(B) NM_016267.3 123 TFF3^(C) NM_003226.3 116 VIT^(N) NM_053276.3 124 THBS2^(A) NM_003247.3 117 VSIG2^(C) NM_014312.3 125 TMEM45B NM_138788.3 118 ZNF469^(A) NM_001127464.1 126 ^(A)Member of the DE-S stromal subtype differentiation gene subset that is associated with the Activated stroma subtype ^(B)Member of the DE-T tumor subtype differentiation gene subset that is associated with the Basal tumor subtype ^(C)Member of the DE-T tumor subtype differentiation gene subset that is associated with the Classical tumor subtype ^(N)Member of the DE-S stromal subtype differentiation gene subset that is associated with the Normal stroma subtype

All of the nucleic acid sequences that correspond to the gene names listed in Table 1 and throughout the instant disclosure, including the corresponding GENBANK® biosequence database Accession Numbers, all annotations and references cited in the corresponding GENBANK® biosequence database entries, and all other nucleic acid sequences that correspond to the listed genetic loci that are present in the GENBANK® biosequence database and related annotations and references, are incorporated herein by reference in their entireties.

DETAILED DESCRIPTION

The present subject matter will be now be described more fully hereinafter with reference to the accompanying Examples, in which representative embodiments of the presently disclosed subject matter are shown. The presently disclosed subject matter can, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the presently disclosed subject matter to those skilled in the art.

I. General Considerations

Pancreatic ductal adenocarcinoma (PDAC) remains a lethal disease with a 5-year survival of 4%. Roughly half of PDAC patients present with metastases at the time of diagnosis, and metastatic disease remains the primary cause of mortality in patients. In this study, we set out to identify subtypes among PDAC patients, with a focus on understanding factors which contribute to patient outcome. A key hallmark of PDAC is the presence of extensive stromal and immune involvement, as well as the presence of endocrine, exocrine, and normal ductal pancreas cells. Additionally, metastatic samples often include cell types from the host organ. Thus, PDAC tumors are in fact complex mixtures in which malignant epithelial cells often represent only a minority of the bulk tumor. For this reason, normal and PDAC tissues often cluster separately from cell lines which are assumed to be purely neoplastic (Iacobuzio-Donahue et al., 2003).

Separating molecular signatures of tissue compartments from measurement of bulk tumor belongs to the general class of problems called blind source separation. Previous studies have used samples of chronic pancreatitis to control for the presence of desmoplastic stroma in tumor samples (Logsdon et al., 2003). In prostate cancer, Stuart et al. have used pathologist assessments of cell types to train models of gene expression signatures of tumor, stroma, and normal tissue (Stuart et al., 2004). In a follow up study, they used their learned gene lists for in silico estimation of tissue components in a larger set of data (Wang et al., 2010). A similar approach has also been used to quantify stromal content across multiple TCGA data sets (Yoshihara et al., 2013). Among source separation techniques, nonnegative matrix factorization (NMF) is especially well suited for biological data, because it constrains all sources to be positive in nature, reflecting the goal of identifying positive gene expression exemplars, rather than pairwise differences between tissue types. Alexandrov et al. have recently demonstrated that NMF is useful for a similar problem of identifying mutational signatures from the aggregate list of somatic mutations in human cancer samples (Alexandrov et al., 2013a,b).

As disclosed herein, NMF was applied to a large microarray data set of primary and metastatic samples of PDAC to evaluate tumor and stroma specific gene expression signatures. Briefly, NMF was defined as modeling the matrix X of expression for g genes and s samples, as the product of a matrix G of g gene weights for k factors and a matrix S of s sample weights for k factors. By looking at samples with mixed tumor and stroma cellularity, two tumor subtypes have been identified that were validated in multiple data sets, as well as important contributions from normal, immune, and stromal compartments.

II. Definitions

All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques or substitutions of equivalent techniques that would be apparent to one of skill in the art. While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the presently disclosed subject matter.

Following long-standing patent law convention, the terms “a,” “an,” and “the” mean “one or more” when used in this application, including the claims. Thus, the phrase “a cell” refers to one or more cells, unless the context clearly indicates otherwise.

As used herein, the term “and/or” when used in the context of a list of entities, refers to the entities being present singly or in combination. Thus, for example, the phrase “A, B, C, and/or D” includes A, B, C, and D individually, but also includes any and all combinations and subcombinations of A, B, C, and D.

The term “comprising,” which is synonymous with “including,” “containing,” and “characterized by,” is inclusive or open-ended and does not exclude additional, unrecited elements and/or method steps. “Comprising” is a term of art that means that the named elements and/or steps are present, but that other elements and/or steps can be added and still fall within the scope of the relevant subject matter.

As used herein, the phrase “consisting of” excludes any element, step, and/or ingredient not specifically recited. For example, when the phrase “consists of” appears in a clause of the body of a claim, rather than immediately following the preamble, it limits only the element set forth in that clause; other elements are not excluded from the claim as a whole.

As used herein, the phrase “consisting essentially of” limits the scope of the related disclosure or claim to the specified materials and/or steps, plus those that do not materially affect the basic and novel characteristic(s) of the disclosed and/or claimed subject matter. For example, the presently disclosed subject matter in some embodiments can “consist essentially of” determining expression levels for one or more genes listed in Table 1 in PDAC cells present in a sample (e.g., a biopsy) obtained from a subject, which means that the recited gene(s) is/are the only genes for which an expression level or expression levels are determined. It is noted, however, that expression levels for various positive and/or negative control genes can also be determined, for example, to standardize and/or normalize the expression levels in PDAC cells of the genes employed, if desired, and still be within the scope of the phrase consist essentially of determining expression levels for one or more genes listed in Table 1.

With respect to the terms “comprising,” “consisting essentially of,” and “consisting of,” where one of these three terms is used herein, the presently disclosed and claimed subject matter can include the use of either of the other two terms. For example, it is understood that the methods of the presently disclosed subject matter in some embodiments comprise the steps that are disclosed herein and/or that are recited in the claims, in some embodiments consist essentially of the steps that are disclosed herein and/or that are recited in the claims, and in some embodiments consist of the steps that are disclosed herein and/or that are recited in the claim.

The term “subject” as used herein refers to a member of any invertebrate or vertebrate species. Accordingly, the term “subject” is intended to encompass any member of the Kingdom Animalia including, but not limited to the phylum Chordata (i.e., members of Classes Osteichythyes (bony fish), Amphibia (amphibians), Reptilia (reptiles), Ayes (birds), and Mammalia (mammals)), and all Orders and Families encompassed therein. In some embodiments, the presently disclosed subject matter relates to human subjects.

Similarly, all genes, gene names, and gene products disclosed herein are intended to correspond to orthologs from any species for which the compositions and methods disclosed herein are applicable. Thus, the terms include, but are not limited to genes and gene products from humans. It is understood that when a gene or gene product from a particular species is disclosed, this disclosure is intended to be exemplary only, and is not to be interpreted as a limitation unless the context in which it appears clearly indicates. Thus, for example, the genes and/or gene products disclosed herein are also intended to encompass homologous genes and gene products from other animals including, but not limited to other mammals, fish, amphibians, reptiles, and birds.

The methods and compositions of the presently disclosed subject matter are particularly useful for warm-blooded vertebrates. Thus, the presently disclosed subject matter concerns mammals and birds. More particularly provided is the use of the methods and compositions of the presently disclosed subject matter on mammals such as humans and other primates, as well as those mammals of importance due to being endangered (such as Siberian tigers), of economic importance (animals raised on farms for consumption by humans) and/or social importance (animals kept as pets or in zoos) to humans, for instance, carnivores other than humans (such as cats and dogs), swine (pigs, hogs, and wild boars), ruminants (such as cattle, oxen, sheep, giraffes, deer, goats, bison, and camels), rodents (such as mice, rats, and rabbits), marsupials, and horses. Also provided is the use of the disclosed methods and compositions on birds, including those kinds of birds that are endangered, kept in zoos, as well as fowl, and more particularly domesticated fowl, e.g., poultry, such as turkeys, chickens, ducks, geese, guinea fowl, and the like, as they are also of economic importance to humans. Thus, also provided is the application of the methods and compositions of the presently disclosed subject matter to livestock, including but not limited to domesticated swine (pigs and hogs), ruminants, horses, poultry, and the like.

The term “about,” as used herein when referring to a measurable value such as an amount of weight, time, dose, etc., is meant to encompass variations of in some embodiments ±20%, in some embodiments ±10%, in some embodiments ±5%, in some embodiments ±1%, and in some embodiments ±0.1% from the specified amount, as such variations are appropriate to perform the disclosed methods and/or to employ the presently disclosed arrays.

As used herein the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or trait in an organism. Similarly, the phrase “gene product” refers to biological molecules that are the transcription and/or translation products of genes. Exemplary gene products include, but are not limited to mRNAs and polypeptides that result from translation of mRNAs. Any of these naturally occurring gene products can also be manipulated in vivo or in vitro using well known techniques, and the manipulated derivatives can also be gene products. For example, a cDNA is an enzymatically produced derivative of an RNA molecule (e.g., an mRNA), and a cDNA is considered a gene product. Additionally, polypeptide translation products of mRNAs can be enzymatically fragmented using techniques well known to those of skill in the art, and these peptide fragments are also considered gene products.

It is understood that while exemplary nucleotide sequences for the human orthologs of the genes listed in Table 1 are disclosed herein, orthologs of these genes from other species are also included within the presently disclosed subject matter.

The term “isolated,” as used in the context of a nucleic acid or polypeptide (including, for example, a nucleotide sequence, a polypeptide, and/or a peptide), indicates that the nucleic acid or polypeptide exists apart from its native environment. An isolated nucleic acid or polypeptide can exist in a purified form or can exist in a non-native environment.

Further, as used for example in the context of a cell, nucleic acid, polypeptide, or peptide, the term “isolated” indicates that the cell, nucleic acid, polypeptide, or peptide exists apart from its native environment. In some embodiments, “isolated” refers to a physical isolation, meaning that the cell, nucleic acid, polypeptide, or peptide has been removed from its native environment (e.g., from a subject).

The terms “nucleic acid molecule” and “nucleic acid” refer to deoxyribonucleotides, ribonucleotides, and polymers thereof, in single-stranded or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar properties as the reference natural nucleic acid. The terms “nucleic acid molecule” and “nucleic acid” can also be used in place of “gene,” “cDNA,” and “mRNA.” Nucleic acids can be synthesized, or can be derived from any biological source, including any organism.

As used herein, the terms “peptide” and “polypeptide” refer to polymers of at least two amino acids linked by peptide bonds. Typically, “peptides” are shorter than “polypeptides,” but unless the context specifically requires, these terms are used interchangeably herein.

As used herein, a cell, nucleic acid, or peptide exists in a “purified form” when it has been isolated away from some, most, or all components that are present in its native environment, but also when the proportion of that cell, nucleic acid, or peptide in a preparation is greater than would be found in its native environment. As such, “purified” can refer to cells, nucleic acids, and peptides that are free of all components with which they are naturally found in a subject, or are free from just a proportion thereof.

III. Methods for Generating Prognostic and/or Subtype Signatures

In some embodiments, the presently disclosed subject matter provides methods for generating prognostic and/or subtype signatures for a subject with cancer (e.g., pancreatic ductal adenocarcinoma (PDAC)). As used herein, the phrase “prognostic and/or subtype signature” refers to a gene expression profile comprising gene expression levels for one or more of the genes disclosed in Table 1 in PDAC cells obtained from the subject, wherein the determining provides a prognostic and/or subtype signature for the subject. In some embodiments, a gene expression profile of the presently disclosed subject matter can comprise gene expression levels for one, five, ten, 25, 50, or 100 of more of the genes listed in Tables 2-5. In some embodiments, a gene expression profile of the presently disclosed subject matter can comprise gene expression levels for all of the genes listed in Tables 2-5.

As disclosed herein, such gene expression profiles can be predictive of various clinical outcomes, for example, by comparing to appropriate standards.

In some embodiments, methods for generating prognostic and/or subtype signatures further comprise comparing the derived prognostic and/or subtype signatures to one or more standards. As used herein, the term “standard” refers to an entity to which another entity (e.g., a prognostic and/or subtype signature) can be compared such that the comparison provides information of interest. An exemplary standard that is described herein is a test set. Additional discussion of standards can be found herein below. Such a comparison can be carried out on an apparatus, such as a system comprising a suitably programmed computer.

Thus, a profile can be created once an expression level is determined for a gene. As used herein, the term “profile” (e.g., a “gene expression profile”) refers to a repository of the expression level data that can be used to compare the expression levels of one or more genes, such as but not limited to one or more different genes among various subjects. For example, for a given subject, the term “profile” can encompass the expression levels of one or more of the genes disclosed herein detected in whatever units are chosen.

The term “profile” is also intended to encompass manipulations of the expression level data derived from a subject. For example, once relative expression levels are determined for a given set of genes in a subject, the relative expression levels for that subject can be compared to a standard to determine if the expression levels in that subject are higher or lower than for the same genes in the standard. Standards can include any data deemed to be relevant for comparison. Such a comparison can be carried out on an apparatus, such as a system comprising a suitably programmed computer. In some embodiments, an expression profile with respect to a plurality of the genes listed in Table 1 is presented such that a subject can be assigned into one particular treatment category (i.e., normal vs. activated stroma or classical vs. basal subtypes) based on the expression profile.

IV. Methods for Selecting a Treatment

The presently disclosed subject matter also provides methods for selecting a treatment for a subject diagnosed with pancreatic ductal adenocarcinoma (PDAC). In some embodiments, the methods comprise assigning the subject into a classification based on an analysis of a gene expression profile with respect to one or more of the genes listed in Table 1, wherein the analysis classifies the subject as having a tumor that corresponds to either a normal vs. an activated stroma subtype, or alternatively a classical vs. basal subtype.

In some embodiments a method for selecting a treatment comprises classifying a patient as being in a normal vs. an activated stroma subtype or a classical vs. basal subtype using one or more of Algorithms A-C described herein below.

IV.A. Overview of Exemplary Diagnostic Algorithms

The presently disclosed subject matter provides in some embodiments algorithms that can be employed for classifying PDAC subtypes in patient samples. In some embodiments, a particular algorithm is selected based on whether or not cytopathological assessment of the sample provides a reasonable basis for an initial diagnosis, and if so, whether the presence of metastatic disease is suggested thereby.

IV.A.1. Algorithm A: Diagnosing Pancreatic Cancer from a Non-Diagnostic Specimen on Traditional Cytopathology

Low tumor cellularity and high stroma content has long hampered the ability to diagnose pancreatic cancer on biopsies. According to pathology assessments, stroma comprises on average 39% of the primary tumor samples examined. At least 8% of endoscopic ultrasound biopsies are non-diagnostic (Gress et al., 2001). Biopsy results can alter the decision to proceed with surgery, which involves an operation that has an attendant postoperative complication and hospital readmission rates of 59% and mortality of 6% (DeOliveira et al., 2006; Eppsteiner et al., 2009; Yermilov et al., 2009). Therefore, clarity of biopsy results can be a key factor for correctly diagnosing patients and for assisting their physicians in determining appropriate treatment strategies.

The stroma subtypes disclosed herein have the potential to overcome the cellularity problem and provides a much needed diagnostic tool that leverages the most abundant component of tumor biopsies of pancreatic cancer. An example of the decision making process based on the genomic subtypes disclosed herein is described herein.

IV.A.2. Algorithm B: Diagnostic Specimen on Traditional Cytopathology or Diagnosis after Application of Algorithm A—Determining Tumor Subtype in the Non-Metastatic Setting

Despite curative operations, pancreatic cancer patients who have had their tumors fully resected only have a median survival of 23 months (Neuhaus et al., 2008). The majority of patients relapse with metastatic disease.

Thus, there has been much interest in using systemic therapies preoperatively in an attempt to treat micrometastatic disease that might be present at the time of surgery (i.e., neoadjuvant approaches). The tumor and stroma subtypes disclosed herein are independently prognostic and diagnostic, and can add value to prognosticating the outcome of patients. Algorithm B provides an exemplary treatment approach based on findings of specific subtype mixtures with classical/normal being the best and basal/activated the worst.

IV.A.3. Algorithm C: Determining Tumor Subtype in the Metastatic Setting

Recent studies have shown two promising chemotherapeutic regimens for patients with metastatic pancreatic cancer (Louvet et al., 2005; Conroy et al., 2011). However, promising targeted therapies have been lacking. Algorithm C provides an exemplary treatment approach dependent on subtype identified using the methods and compositions disclosed herein.

IV.B. Determination of Subtypes

Patient samples can be profiled for mRNA expression by any method that provides for an analysis of quantitative gene expression. Non-limiting examples of such techniques include whole transcriptome RNAseq, targeted RNAseq, SAGE, RT-PCR (particularly QRT-PCR), and cDNA microarray analyses. With respect to the presently disclosed methods, gene expression from the following lists are measured: (1) the four “core” expression lists for each of the four subtypes, which describe genes which are overexpressed in each subtype; and (2) the four “differential” expression lists, which define genes which are uniquely expressed in each subtype. Genes from the core lists are not mutually exclusive, as there are genes which are expressed by both tumor subtypes, and could be relevant targets for treatment in both groups. Genes from the core lists are used to select from among appropriate therapeutic targets for a particular subtype. Genes from differential lists are, by design, mutually exclusive and represent the most discriminatory biomarkers for subtype diagnosis. For classification purposes, the union of tumor subtype differential genes are referred to herein as “DE-T” (see Table 1), and the union of stromal subtype differentiation genes are referred to herein as “DE-S” (see Table 1).

Two classifiers, (one using DE-T, and one using DE-S), are used to classify new samples using a Bayesian framework that allows for incorporation of a priori evidence such as population prevalence, and allows for the assessment of confidence in each decision (Duda et al., 2012). For example, DE-S gene expression from an unknown sample is compared to the DE-S gene expression of each of two template centroids representing the two stromal subtypes. Or, for example, DE-T gene expression is assessed with a top-scoring-pairs logistic regression model to estimate probability of class membership. Samples are classified as the subtype with which they exhibit the highest degree of likelihood as formalized by maximum a posteriori probability and associated confidence level. Thus, each sample has both a stroma and a tumor classification type with associated confidences for clinical use.

Alternatively or in addition, the gene pairs disclosed in Tables 9-11 below can be employed for determining tumor and stromal subtypes in cancers including, but not limited to the breast, bladder, or pancreas. For example, cancers in these tissues can be identified as being basal-like or not basal-like using the gene pairs disclosed in Table 9 below. To classify each sample, gene expression from pairs of genes in Table 9 below can be compared such that for each gene pair, if Gene A expression is greater than Gene B expression, the coefficient for that gene pair was added to a running sum. If the sum of all such coefficients and the intercept from Table 9 below is greater than zero, the sample is classified as basal (see EQUATION 1).

Using the gene pairs in Table 9 below for breast, bladder, or pancreas, if A_(i) and B_(i) are the measured expression of Genes A and B of Table 9 in the i^(th) row, C_(i) is the i^(th) coefficient, and I is the intercept, then a decision can be calculated as follows:

$\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Basal}\mspace{14mu}{if}\mspace{14mu} d} > 0} \\ {{{Not}\mspace{14mu}{Basal}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 1} \end{matrix}$

More particularly in the case of cancer of the pancreas, the gene pairs listed in Table 10 below can be employed for classifying a pancreas tumor as being of the activated stroma subtype or the normal stroma subtype. Using Table 10 below, if A_(i) and B_(i) are the measured expression of Genes A and B of Table 10 in the i^(th) row, C_(i) is the i^(th) coefficient, and I is the intercept, then a decision can be calculated as in EQUATION 2:

$\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Activated}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{11mu} d} > 0} \\ {{{Normal}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{11mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 2} \end{matrix}$

Also more particularly in the case of cancer of the pancreas, the gene pairs listed in Table 11 below can be employed for classifying a pancreas tumor as being of the basal subtype or the classical subtype. Using Table 11 below, if A_(i) and B_(i) are the measured expression of Genes A and B of Table 11 in the i^(th) row, C_(i) is the i^(th) coefficient, and I is the intercept, then a decision can be calculated as in EQUATION 3:

$\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Basal} - {{like}\mspace{14mu}{if}\mspace{14mu} d}} > 0} \\ {{{Classical}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 3} \end{matrix}$

IV.C. Determination of Subtype-specific Treatment Strategies

Many of the genes that are descriptive for each subtype have yet to have an available drug. However, the majority are targetable and as drugs become available, and thus are expected to guide therapeutic decisions in the future.

At the current time, treatment of pancreatic cancer is limited to three regimens: gemcitabine, gemcitabine in combination with nab-paclitaxel (Von Hoff et al., 2013), and treatment with FOLFIRINOX (composed of folinic acid (leucovorin), fluorouracil, irinotecan, and oxaliplatin; Conroy et al., 2011). In those patients with non-metastatic disease, the subset of patients classified as classical/normal are offered surgery as the first stage of therapy. In those patients classified as classical/activated, the basal/activated subset and the basal/normal subset are offered chemotherapy (FOLFIRINOX or gemcitabine+nab-paclitaxel, dependent on oncologist and patient preference and patient tolerance) prior to surgery as outcome in patients with basal subtypes after surgery is poor, with 50% of patients relapsing and dying about 1 year after the surgery that had been intended to cure the disease. As therapies in trial become available, all patients with activated subtypes will be offered stroma modulating therapies (see examples described herein below) prior to surgery. In some embodiments, patients with basal subtypes derive greater benefit from chemotherapy after surgery as described herein.

For those patients with metastatic disease, the classical/normal subset of patients can proceed with currently available chemotherapies. For the subset of patients with other subtypes, therapies are tailored as described in more detail herein below. In some embodiments, different subtypes respond to different therapies, so as newer therapies develop the selected strategies can be altered.

Drug regimens can be further tailored by tumor and/or stroma subtype as drugs currently in early phase clinical trials become available. For instance, patients with activated stroma subtypes could benefit from extracellular matrix-associated therapies such as hyaluronidase treatment (currently in clinical trials) and/or collagenase treatment in combination with other therapies.

Patients with normal subtype tumors might not benefit from similar stroma-modulating agents, which conversely could be harmful. Rather, such patients' disease could be sensitive to anti-PDGFRB- or anti-TEK-directed therapy.

Patients with the basal subtype might benefit from AGS-14CD4, crizotinib, or erlotinib, or other kinase inhibitors that have anti-MET activity. Patients with classical subtypes might benefit from varespladib, cobicistat, traztuzumab, or other kinase inhibitors with anti-ERBB2 or anti-EGFR activity.

Finally, Table 6 shows a list of kinases that can be considered as therapeutic targets for patients with classical and basal subtype tumors.

Tables 2-5 list the genes that define each subtype and the currently known drugs and/or combination(s) of drugs that can be used based on the overall subtype. The gene lists in Tables 2-5 are descriptive for each subtype and are relevant to designing treatment regimens for each subtype, but are not necessarily mutually exclusive as multiple treatment possibilities can be considered for each subtype. For diagnostic purposes, subsets of these genes, which are unique to each subtype, were used (see DE-S and DE-T above).

Regardless of whether specific drugs have been effective in pancreatic cancer, the results disclosed herein suggested that pancreatic cancer is not one singular disease, and unless specific therapies are appropriately tailored, individual patients are unlikely to benefit from the current one size fits all approach to treatment. The findings disclosed herein can thus be used to personalize therapies to individual patients by reference to their tumor and/or stroma subtype.

V. Methods of Gene Expression/Transcriptome Analysis

V.A. Assay Formats

The genes identified as being differentially expressed in, for example, normal subtype vs. activated stroma subtype PDAC, or alternatively classical subtype vs. basal subtype PDAC, can be used in a variety of nucleic acid detection assays to detect and/or quantitate the expression level of a gene or multiple genes in a given sample. For example, Northern blotting, nuclease protection, RT-PCR (e.g., quantitative RT-PCR; QRT-PCR), and/or differential display methods can be used for detecting gene expression levels. In some embodiments, methods and assays of the presently disclosed subject matter are employed with array or chip hybridization-based methods and systems for detecting the expression of a plurality of genes. However, it is noted that any nucleotide analysis method can be employed with the presently disclosed subject matter, including in some embodiments RNA sequencing and transcriptome analysis.

Any hybridization assay format can be used, including solution-based and solid support-based assay formats. Representative solid supports containing oligonucleotide probes for differentially expressed genes of the presently disclosed subject matter can be filters, polyvinyl chloride dishes, silicon, glass based chips, etc. Such wafers and hybridization methods are widely available and include, for example, those disclosed in PCT International Patent Application Publication WO 1995/011755). Any solid surface to which oligonucleotides can be bound, either directly or indirectly, either covalently or non-covalently, can be used. An exemplary solid support is a high-density array or DNA chip. These contain a particular oligonucleotide probe in a predetermined location on the array. Each predetermined location can contain more than one molecule of the probe, but in some embodiments each molecule within the predetermined location has an identical sequence. Such predetermined locations are termed features. There can be any number of features on a single solid support including, for example, about 2, 10, 100, 1000, 10,000, 100,000, or 400,000 of such features on a single solid support. The solid support, or the area within which the probes are attached, can be of any convenient size (for example, on the order of a square centimeter).

Oligonucleotide probe arrays for differential gene expression monitoring can be made and employed according to any techniques known in the art (see e.g., Lockhart et al., 1996; McGall et al., 1996). Such probe arrays can contain at least two or more oligonucleotides that are complementary to or hybridize to two or more of the genes described herein. Such arrays can also contain oligonucleotides that are complementary or hybridize to at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 70, 100, or more of the nucleic acid sequences disclosed herein.

The genes that are assayed according to the presently disclosed subject matter are typically in the form of RNA (e.g., total RNA or mRNA) and/or reverse transcribed RNA (i.e., cDNA), including subsequences thereof. The genes can be cloned or not, and the genes can be amplified or not. In some embodiments, poly A⁺ RNA is employed as a source.

Probes based on the sequences of the genes described herein can be prepared by any commonly available method. Oligonucleotide probes for assaying the tissue or cell sample are in some embodiments of sufficient length to specifically hybridize only to appropriate complementary genes or transcripts. Typically, the oligonucleotide probes are at least 10, 12, 14, 16, 18, 20, or 25 nucleotides in length. In some embodiments, longer probes of at least 30, 40, 50, or 60 nucleotides are employed.

As used herein, oligonucleotide sequences that are complementary to one or more of the genes described herein are oligonucleotides that are capable of hybridizing under stringent conditions to at least part of the nucleotide sequence of said genes. Such hybridizable oligonucleotides will typically exhibit in some embodiments at least about 75% sequence identity, in some embodiments about 80% sequence identity, in some embodiments about 85% sequence identity, in some embodiments about 90% sequence identity, in some embodiments about 91% sequence identity, in some embodiments about 92% sequence identity, in some embodiments about 93% sequence identity, in some embodiments about 94% sequence identity, in some embodiments about 95% sequence identity, and in some embodiments greater than 95% sequence identity (e.g., 96%, 97%, 98%, 99%, or 100% sequence identity) at the nucleotide level to the nucleic acid sequences disclosed herein.

“Bind(s) substantially” refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target polynucleotide sequence.

The terms “background” or “background signal intensity” refer to hybridization signals resulting from non-specific binding, or other interactions, between the labeled target nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, control probes, the array substrate, etc.). Background signals can also be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal can be calculated for each target nucleic acid. In some embodiments, background is calculated as the average hybridization signal intensity for the lowest 5% to 10% of the probes in the array, or, where a different background signal is calculated for each target gene, for the lowest 5% to 10% of the probes for each gene. Of course, one of skill in the art will appreciate that where the probes to a particular gene hybridize well and thus appear to be specifically binding to a target sequence, they should not be used in a background signal calculation. Alternatively, background can be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample (e.g., probes directed to nucleic acids of the opposite sense or to genes not found in the sample such as bacterial genes where the sample is mammalian nucleic acids). Background can also be calculated as the average signal intensity produced by regions of the array that lack probes.

Assays, methods, and systems of the presently disclosed subject matter can utilize available formats to simultaneously screen in some embodiments at least about 10, in some embodiments at least about 50, in some embodiments at least about 100, in some embodiments at least about 1000, in some embodiments at least about 10,000, and in some embodiments at least about 40,000 or more different nucleic acid hybridizations.

As used herein, a “probe” is defined as a nucleic acid that is capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. As used herein, a probe can include natural (i.e., A, G, U, C, or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in probes can be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, probes can be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.

The terms “mismatch control” and “mismatch probe” refer to a probe comprising a sequence that is deliberately selected not to be perfectly complementary to a particular target sequence. For each mismatch (MM) control in a high-density array there typically exists a corresponding perfect match (PM) probe that is perfectly complementary to the same particular target sequence. The mismatch can comprise one or more bases.

While the mismatch(s) can be located anywhere in the mismatch probe, terminal mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of the target sequence. In some embodiments, the mismatch is located at or near the center of the probe such that the mismatch is most likely to destabilize the duplex with the target sequence under the test hybridization conditions.

The phrase “perfect match probe” refers to a probe that has a sequence that is perfectly complementary to a particular target sequence. The test probe is typically perfectly complementary to a portion (subsequence) of the target sequence. The perfect match (PM) probe can be a “test probe,” a “normalization control” probe, an expression level control probe, or the like. A perfect match control or perfect match probe is, however, distinguished from a “mismatch control” or “mismatch probe.”

V.B. Probe Design

Upon review of the present disclosure, one of skill in the art will appreciate that an enormous number of array designs are suitable for the practice of the presently disclosed subject matter. The high-density array typically includes a number of probes that specifically hybridize to the sequences of interest. See PCT International Patent Application Publication WO 1999/032660, incorporated herein by reference in its entirety, for methods of producing probes for a given gene or genes. In addition, in some embodiments, the array includes one or more control probes.

High-density array chips of the presently disclosed subject matter include in some embodiments “test probes.” Test probes can be oligonucleotides that in some embodiments range from about 5 to about 500 or about 5 to about 50 nucleotides, in some embodiments from about 10 to about 40 nucleotides, and in some embodiments from about 15 to about 40 nucleotides in length. In some embodiments, the probes are about 20 to 25 nucleotides in length. In some embodiments, test probes are double or single strand DNA sequences. DNA sequences are isolated or cloned from natural sources and/or amplified from natural sources using natural nucleic acid as templates. These probes have sequences complementary to particular subsequences of the genes the expression of which they are designed to detect. Thus, the test probes are capable of specifically hybridizing to the target nucleic acid they are to detect.

In addition to test probes that bind the target nucleic acid(s) of interest, the high-density array can contain a number of control probes. The control probes fall into three categories referred to herein as (1) normalization controls; (2) expression level controls; and (3) mismatch controls.

Normalization controls are oligonucleotide or other nucleic acid probes that are complementary to labeled reference oligonucleotides or other nucleic acid sequences that are added to the nucleic acid sample. The signals obtained from the normalization controls after hybridization provide a control for variations in hybridization conditions, label intensity, “reading” efficiency and other factors that can cause the signal of a perfect hybridization to vary between arrays. In some embodiments, signals (e.g., fluorescence intensity) read from some or all other probes in the array are divided by the signal (e.g., fluorescence intensity) from the control probes, thereby normalizing the measurements.

Virtually any probe can serve as a normalization control. However, it is recognized that hybridization efficiency varies with base composition and probe length. Exemplary normalization probes can be selected to reflect the average length of the other probes present in the array; however, they can be selected to cover a range of lengths. The normalization control(s) can also be selected to reflect the (average) base composition of the other probes in the array; however, in some embodiments, only one or a few probes are used and they are selected such that they hybridize well (i.e., no secondary structure) and do not match any target-specific probes.

Expression level controls are probes that hybridize specifically with constitutively expressed genes in the biological sample. Virtually any constitutively expressed gene provides a suitable target for expression level controls. Typical expression level control probes have sequences complementary to subsequences of constitutively expressed “housekeeping genes” including, but not limited to, the (3-actin gene, the transferrin receptor gene, the GAPDH gene, and the like. Exemplary human housekeeping genes are disclosed in Eisenberg & Levanon, 2003. It is noted that certain of the genes listed in Eisenberg & Levanon, 2003 are also listed in one or more of Tables 2-5. In some embodiments, a gene that appears in Eisenberg & Levanon, 2003 and also in one or more of Tables 2-5 is not selected for use as an expression level control.

Mismatch controls can also be provided for the probes to the target genes, for expression level controls or for normalization controls. Mismatch controls are oligonucleotide probes or other nucleic acid probes identical to their corresponding test or control probes except for the presence of one or more mismatched bases. A mismatched base is a base selected so that it is not complementary to the corresponding base in the target sequence to which the probe would otherwise specifically hybridize. One or more mismatches are selected such that under appropriate hybridization conditions (e.g., stringent conditions) the test or control probe would be expected to hybridize with its target sequence, but the mismatch probe would not hybridize (or would hybridize to a significantly lesser extent). In some embodiments, mismatch probes contain one or more central mismatches. Thus, for example, where a probe is a 20-mer, a corresponding mismatch probe will have the identical sequence except for a single base mismatch (e.g., substituting a G, a C, or a T for an A) at any of positions 6 through 14 (the central mismatch).

Mismatch probes thus provide a control for non-specific binding or cross hybridization to a nucleic acid in the sample other than the target to which the probe is directed. Mismatch probes also indicate whether a given hybridization is specific or not. For example, if the target is present the perfect match probes should be consistently brighter than the mismatch probes. In addition, if all central mismatches are present, the mismatch probes can be used to detect a mutation. The difference in intensity between the perfect match and the mismatch probe (IBM)-I(MM)) provides a good measure of the concentration of the hybridized material.

V.C. Nucleic Acid Samples

A biological sample that can be analyzed in accordance with the presently disclosed subject matter comprises in some embodiments a nucleic acid. The terms “nucleic acid,” “nucleic acids,” and “nucleic acid molecules” each refer in some embodiments to deoxyribonucleotides, ribonucleotides, and polymers and folded structures thereof in either single- or double-stranded form. Nucleic acids can be derived from any source, including any organism. Deoxyribonucleic acids can comprise genomic DNA, cDNA derived from ribonucleic acid, DNA from an organelle (e.g., mitochondrial DNA or chloroplast DNA), or combinations thereof. Ribonucleic acids can comprise genomic RNA (e.g., viral genomic RNA), messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), or combinations thereof.

V.C.1. Isolation of Nucleic Acid Samples

Nucleic acid samples used in the methods and assays of the presently disclosed subject matter can be prepared by any available method or process. Methods of isolating total mRNA are also known to those of skill in the art. For example, methods of isolation and purification of nucleic acids are described in detail in Chapter 3 of Tijssen, 1993. Such samples include RNA samples, but also include cDNA synthesized from an mRNA sample isolated from a cell or tissue of interest. Such samples also include DNA amplified from the cDNA, an RNA transcribed from the amplified DNA, and combinations thereof. One of skill in the art would appreciate that it can be desirable to inhibit or destroy RNase present in homogenates before homogenates are used as a source of RNA.

The presently disclosed subject matter encompasses use of a sufficiently large biological sample to enable a comprehensive survey of low abundance nucleic acids in the sample. Thus, the sample can optionally be concentrated prior to isolation of nucleic acids. Several protocols for concentration have been developed that alternatively use slide supports (Kohsaka & Carson, 1994; Millar et al., 1995), filtration columns (Bej et al., 1991), or immunomagnetic beads (Albert et al., 1992; Cousins et al., 1992). Such approaches can significantly increase the sensitivity of subsequent detection methods.

As one example, SEPHADEX® matrix (Sigma of St. Louis, Mo., United States of America) is a matrix of diatomaceous earth and glass suspended in a solution of chaotropic agents and has been used to bind nucleic acid material (Boom et al., 1990; Buffone et al., 1991). After the nucleic acid is bound to the solid support material, impurities and inhibitors are removed by washing and centrifugation, and the nucleic acid is then eluted into a standard buffer. Target capture also allows the target sample to be concentrated into a minimal volume, facilitating the automation and reproducibility of subsequent analyses (Lanciotti et al., 1992).

Methods for nucleic acid isolation can comprise simultaneous isolation of total nucleic acid, or separate and/or sequential isolation of individual nucleic acid types (e.g., genomic DNA, cDNA, organelle DNA, genomic RNA, mRNA, poly A⁺ RNA, rRNA, tRNA) followed by optional combination of multiple nucleic acid types into a single sample.

When RNA (e.g., mRNA) is selected for analysis, the disclosed methods allow for an assessment of gene expression in the tissue or cell type from which the RNA was isolated. RNA isolation methods are known to one of skill in the art. See Albert et al., 1992; Busch et al., 1992; Hamel et al., 1995; Herrewegh et al., 1995; Izraeli et al., 1991; McCaustland et al., 1991; Natarajan et al., 1994; Rupp et al., 1988; Tanaka et al., 1994; and Van Kerckhoven et al., 1994.

Simple and semi-automated extraction methods can also be used for nucleic acid isolation, including for example, the SPLIT SECOND™ system (Boehringer Mannheim of Indianapolis, Ind., United States of America), the TRIZOL™ Reagent system (Life Technologies of Gaithersburg, Md., United States of America), and the FASTPREP™ system (Bio 101 of La Jolla, Calif., United States of America). See also Smith 1998a; and Paladichuk 1999.

In some embodiments, nucleic acids that are used for subsequent amplification and labeling are analytically pure as determined by spectrophotometric measurements or by visual inspection following electrophoretic resolution. In some embodiments, the nucleic acid sample is free of contaminants such as polysaccharides, proteins, and inhibitors of enzyme reactions. When a biological sample comprises an RNA molecule that is intended for use in producing a probe, it is preferably free of DNase and RNase. Contaminants and inhibitors can be removed or substantially reduced using resins for DNA extraction (e.g., CHELEX™ 100 from Bio-Rad Laboratories of Hercules, Calif., United States of America) or by standard phenol extraction and ethanol precipitation.

V.C.2. Amplification of Nucleic Acid Samples

In some embodiments, a nucleic acid isolated from a biological sample is amplified prior to being used in the methods disclosed herein. In some embodiments, the nucleic acid is an RNA molecule, which is converted to a complementary DNA (cDNA) prior to amplification. Techniques for the isolation of RNA molecules and the production of cDNA molecules from the RNA molecules are known (see generally, Silhavy et al., 1984; Sambrook & Russell, 2001; Ausubel et al., 2002; and Ausubel et al., 2003). In some embodiments, the amplification of RNA molecules isolated from a biological sample is a quantitative amplification (e.g., by quantitative RT-PCR).

The terms “template nucleic acid” and “target nucleic acid” as used herein each refer to nucleic acids isolated from a biological sample as described herein above. The terms “template nucleic acid pool,” “template pool,” “target nucleic acid pool,” and “target pool” each refer to an amplified sample of “template nucleic acid.” Thus, a target pool comprises amplicons generated by performing an amplification reaction using the template nucleic acid. In some embodiments, a target pool is amplified using a random amplification procedure as described herein.

The term “target-specific primer” refers to a primer that hybridizes selectively and predictably to a target sequence, for example a subsequence of one of the six genes disclosed herein, in a target nucleic acid sample. A target-specific primer can be selected or synthesized to be complementary to known nucleotide sequences of target nucleic acids.

The term “random primer” refers to a primer having an arbitrary sequence. The nucleotide sequence of a random primer can be known, although such sequence is considered arbitrary in that it is not specifically designed for complementarity to a nucleotide sequence of the presently disclosed subject matter. The term “random primer” encompasses selection of an arbitrary sequence having increased probability to be efficiently utilized in an amplification reaction. For example, the Random Oligonucleotide Construction Kit (ROCK) is a macro-based program that facilitates the generation and analysis of random oligonucleotide primers (Strain & Chmielewski, 2001). Representative primers include but are not limited to random hexamers and rapid amplification of polymorphic DNA (RAPD)-type primers as described by Williams et al., 1990.

A random primer can also be degenerate or partially degenerate as described by Telenius et al., 1992. Briefly, degeneracy can be introduced by selection of alternate oligonucleotide sequences that can encode a same amino acid sequence.

In some embodiments, random primers can be prepared by shearing or digesting a portion of the template nucleic acid sample. Random primers so-constructed comprise a sample-specific set of random primers.

The term “heterologous primer” refers to a primer complementary to a sequence that has been introduced into the template nucleic acid pool. For example, a primer that is complementary to a linker or adaptor, as described below, is a heterologous primer. Representative heterologous primers can optionally include a poly(dT) primer, a poly(T) primer, or as appropriate, a poly(dA) or poly(A) primer.

The term “primer” as used herein refers to a contiguous sequence comprising in some embodiments about 6 or more nucleotides, in some embodiments about 10-20 nucleotides (e.g., 15-mer), and in some embodiments about 20-30 nucleotides (e.g., a 22-mer). Primers used to perform the methods of the presently disclosed subject matter encompass oligonucleotides of sufficient length and appropriate sequence so as to provide initiation of polymerization on a nucleic acid molecule.

U.S. Pat. No. 6,066,457 to Hampson et al. describes a method for substantially uniform amplification of a collection of single stranded nucleic acid molecules such as RNA. Briefly, the nucleic acid starting material is anchored and processed to produce a mixture of directional shorter random size DNA molecules suitable for amplification of the sample.

In accordance with the methods and systems of the presently disclosed subject matter, any PCR technique or related technique can be employed to perform the step of amplifying the nucleic acid sample. In addition, such methods can be optimized for amplification of a particular subset of nucleic acid (e.g., genomic DNA versus RNA), and representative optimization criteria and related guidance can be found in the art. See Cha & Thilly, 1993; Linz et al., 1990; Robertson & Walsh-Weller, 1998; Roux 1995; Williams 1989; and McPherson et al., 1995.

V.C.3. Labeling of Nucleic Acid Samples

Optionally, a nucleic acid sample (e.g., a quantitatively amplified RNA sample) further comprises a detectable label. In some embodiments of the presently disclosed subject matter, the amplified nucleic acids can be labeled prior to hybridization to an array. Alternatively, randomly amplified nucleic acids are hybridized with a set of probes, without prior labeling of the amplified nucleic acids. For example, an unlabeled nucleic acid in the biological sample can be detected by hybridization to a labeled probe. In some embodiments, both the randomly amplified nucleic acids and the one or more probes include a label, wherein the proximity of the labels following hybridization enables detection. An exemplary procedure using nucleic acids labeled with chromophores and fluorophores to generate detectable photonic structures is described in U.S. Pat. No. 6,162,603 to Heller.

In accordance with the methods and systems of the presently disclosed subject matter, the amplified nucleic acids and/or probes/probe sets can be labeled using any detectable label. It will be understood to one of skill in the art that any suitable method for labeling can be used, and no particular detectable label or technique for labeling should be construed as a limitation of the disclosed methods.

Direct labeling techniques include incorporation of radioisotopic or fluorescent nucleotide analogues into nucleic acids by enzymatic synthesis in the presence of labeled nucleotides or labeled PCR primers. A radio-isotopic label can be detected using autoradiography or phosphorimaging. A fluorescent label can be detected directly using emission and absorbance spectra that are appropriate for the particular label used. Any detectable fluorescent dye can be used, including but not limited to FITC (fluorescein isothiocyanate), FLUOR X™, ALEXA FLUOR® 488, OREGON GREEN® 488, 6-JOE (6-carboxy-4′,5′-dichloro-2′, 7′-dimethoxyfluorescein, succinimidyl ester), ALEXA FLUOR® 532, Cy3, ALEXA FLUOR® 546, TMR (tetramethylrhodamine), ALEXA FLUOR® 568, ROX (X-rhodamine), ALEXA FLUOR® 594, TEXAS RED®, BODIPY® 630/650, and Cy5 (available from Amersham Pharmacia Biotech of Piscataway, N.J., United States of America or from Molecular Probes Inc. of Eugene, Oreg., United States of America). Fluorescent tags also include sulfonated cyanine dyes (available from Li-Cor, Inc. of Lincoln, Nebr., United States of America) that can be detected using infrared imaging. Methods for direct labeling of a heterogeneous nucleic acid sample are known in the art and representative protocols can be found in, for example, DeRisi et al., 1996; Sapolsky & Lipshutz, 1996; Schena et al., 1995; Schena et al., 1996; Shalon et al., 1996; Shoemaker et al., 1996; and Wang et al., 1989.

In some embodiments, nucleic acid molecules isolated from different cell types (e.g., primary versus metastatic PDAC) are labeled with different detectable markers, allowing the nucleic acids to be analyzed simultaneously on an array. For example, a first RNA sample can be reverse transcribed into cDNAs labeled with cyanine 3 (a green dye fluorophore; Cy3) while a second RNA sample to which the first RNA sample is to be compared can be labeled with cyanine 5 (a red dye fluorophore; Cy5).

The quality of probe or nucleic acid sample labeling can be approximated by determining the specific activity of label incorporation. For example, in the case of a fluorescent label, the specific activity of incorporation can be determined by the absorbance at 260 nm and 550 nm (for Cy3) or 650 nm (for Cy5) using published extinction coefficients (Randolph & Waggoner, 1995). Very high label incorporation (specific activities of >1 fluorescent molecule/20 nucleotides) can result in a decreased hybridization signal compared with probe with lower label incorporation. Very low specific activity (<1 fluorescent molecule/100 nucleotides) can give unacceptably low hybridization signals. See Worley et al., 2000. Thus, it will be understood to one of skill in the art that labeling methods can be optimized for performance in microarray hybridization assay, and that optimal labeling can be unique to each label type.

V.D. Forming High-Density Arrays

In some embodiments of the presently disclosed subject matter, probes or probe sets are immobilized on a solid support such that a position on the support identifies a particular probe or probe set. In the case of a probe set, constituent probes of the probe set can be combined prior to placement on the solid support or by serial placement of constituent probes at a same position on the solid support.

A microarray can be assembled using any suitable method known to one of skill in the art, and any one microarray configuration or method of construction is not considered to be a limitation of the presently disclosed subject matter. Representative microarray formats that can be used in accordance with the methods of the presently disclosed subject matter are described herein below and include, but are not limited to light-directed chemical coupling, and mechanically directed coupling (see U.S. Pat. No. 5,143,854 to Pirrung et al.; U.S. Pat. No. 5,800,992 to Fodor et al.; and U.S. Pat. No. 5,837,832 to Chee et al.).

V.D.1. Array Substrate and Configuration

The substrate for printing the array should be substantially rigid and amenable to DNA immobilization and detection methods (e.g., in the case of fluorescent detection, the substrate must have low background fluorescence in the region of the fluorescent dye excitation wavelengths). The substrate can be nonporous or porous as determined most suitable for a particular application. Representative substrates include but are not limited to a glass microscope slide, a glass coverslip, silicon, plastic, a polymer matrix, an agar gel, a polyacrylamide gel, and a membrane, such as a nylon, nitrocellulose or ANAPORE™ (Whatman of Maidstone, United Kingdom) membrane.

Porous substrates (membranes and polymer matrices) are preferred in that they permit immobilization of relatively large amount of probe molecules and provide a three-dimensional hydrophilic environment for biomolecular interactions to occur (Dubiley et al., 1997; Yershov et al., 1996). A BIOCHIP ARRAYER™ dispenser (Packard Instrument Company of Meriden, Conn., United States of America) can effectively dispense probes onto membranes such that the spot size is consistent among spots whether one, two, or four droplets were dispensed per spot (Englert, 2000).

A microarray substrate for use in accordance with the methods of the presently disclosed subject matter can have either a two-dimensional (planar) or a three-dimensional (non-planar) configuration. An exemplary three-dimensional microarray is the FLOW-THRU™ chip (Gene Logic, Inc. of Gaithersburg, Md., United States of America), which has implemented a gel pad to create a third dimension. Such a three-dimensional microarray can be constructed of any suitable substrate, including glass capillary, silicon, metal oxide filters, or porous polymers. See Yang et al., 1998.

Briefly, a FLOW-THRU™ chip (Gene Logic, Inc.) comprises a uniformly porous substrate having pores or microchannels connecting upper and lower faces of the chip. Probes are immobilized on the walls of the microchannels and a hybridization solution comprising sample nucleic acids can flow through the microchannels. This configuration increases the capacity for probe and target binding by providing additional surface relative to two-dimensional arrays. See U.S. Pat. No. 5,843,767 to Beattie.

V.D.2. Surface Chemistry

The particular surface chemistry employed is inherent in the microarray substrate and substrate preparation. Probe immobilization of nucleic acids probes post-synthesis can be accomplished by various approaches, including adsorption, entrapment, and covalent attachment. Typically, the binding technique is designed to not disrupt the activity of the probe.

For substantially permanent immobilization, covalent attachment is generally performed. Since few organic functional groups react with an activated silica surface, an intermediate layer is advisable for substantially permanent probe immobilization. Functionalized organosilanes can be used as such an intermediate layer on glass and silicon substrates (Liu & Hlady, 1996; Shriver-Lake 1998). A hetero-bifunctional cross-linker requires that the probe have a different chemistry than the surface, and is preferred to avoid linking reactive groups of the same type. A representative hetero-bifunctional cross-linker comprises gamma-maleimidobutyryloxy-succimide (GMBS) that can bind maleimide to a primary amine of a probe. Procedures for using such linkers are known to one of skill in the art and are summarized in Hermanson, 1990. A representative protocol for covalent attachment of DNA to silicon wafers is described by O'Donnell et al., 1997.

When using a glass substrate, the glass should be substantially free of debris and other deposits and have a substantially uniform coating. Pretreatment of slides to remove organic compounds that can be deposited during their manufacture can be accomplished, for example, by washing in hot nitric acid. Cleaned slides can then be coated with 3-aminopropyltrimethoxysilane using vapor-phase techniques. After silane deposition, slides are washed with deionized water to remove any silane that is not attached to the glass and to catalyze unreacted methoxy groups to cross-link to neighboring silane moieties on the slide. The uniformity of the coating can be assessed by known methods, for example electron spectroscopy for chemical analysis (ESCA) or ellipsometry (Ratner & Castner, 1997; Schena et al., 1995). See also Worley et al., 2000.

For attachment of probes greater than about 300 base pairs, noncovalent binding is suitable. A representative technique for noncovalent linkage involves use of sodium isothiocyanate (NaSCN) in the spotting solution. When using this method, amino-silanized slides are typically employed because this coating improves nucleic acid binding when compared to bare glass. This method works well for spotting applications that use about 100 ng/μl (Worley et al., 2000).

In the case of nitrocellulose or nylon membranes, the chemistry of nucleic acid binding chemistry to these membranes has been well characterized (Southern, 1975; Sambrook & Russell, 2001).

V.D.3. Arraying Techniques

A microarray for the analysis of gene expression in a biological sample can be constructed using any one of several methods available in the art, including but not limited to photolithographic and microfluidic methods, further described herein below. In some embodiments, the method of construction is flexible, such that a microarray can be tailored for a particular purpose.

Exemplary arraying techniques include, but are not limited to light-directed synthesis (Fodor et al., 1991; Fodor et al., 1993), commercialized by Affymetrix of Santa Clara, Calif., United States of America; Digital Optical Chemistry (PCT International Patent Application Publication No. WO 1999/063385; Warrington et al., 2000); Contact Printing (Maier et al., 1994; Mace et al., 2000; Rose, 2000); Noncontact Ink-Jet Printing U.S. Pat. No. 5,965,352 to Stoughton & Friend; see also Theriault et al., 1999); Syringe-Solenoid Printing (U.S. Pat. Nos. 5,743,960 and 5,916,524, both to Tisone); Electronic Addressing (U.S. Pat. No. 6,225,059 to Ackley et al. and PCT International Patent Application Publication No. WO 2001/023082); and Nanoelectrode Synthesis (U.S. Pat. No. 6,123,819 to Peeters).

In addition to the foregoing, other methods that can be used to generate an array of oligonucleotides on a single substrate are described in PCT International Patent Application Publication WO 1993/009668. High-density nucleic acid arrays can also be fabricated by depositing pre-made and/or natural nucleic acids in predetermined positions. Synthesized or natural nucleic acids are deposited on specific locations of a substrate by light directed targeting and oligonucleotide directed targeting. A dispenser that moves from region to region to deposit nucleic acids in specific spots can also be employed.

V.E. Hybridization

V.E.1. General Considerations

The terms “specifically hybridizes” and “selectively hybridizes” each refer to binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex nucleic acid mixture (e.g., total cellular DNA or RNA).

The phrase “substantially hybridizes” refers to complementary hybridization between a probe nucleic acid molecule and a substantially identical target nucleic acid molecule as defined herein. Substantial hybridization is generally permitted by reducing the stringency of the hybridization conditions using art-recognized techniques.

“Stringent hybridization conditions” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization experiments are both sequence- and environment-dependent. Longer sequences hybridize specifically at higher temperatures. Generally, highly stringent hybridization and wash conditions are selected to be about 5° C. lower than the thermal melting point (T_(m)) for the specific sequence at a defined ionic strength and pH. The T_(m) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the T_(m) for a particular probe. Typically, under “stringent conditions” a probe hybridizes specifically to its target sequence, but to no other sequences.

An extensive guide to the hybridization of nucleic acids is found in Tijssen, 1993. In general, a signal to noise ratio of 2-fold (or higher) than that observed for a negative control probe in a same hybridization assay indicates detection of specific or substantial hybridization.

V.E.2. Hybridization on a Solid Support

In some embodiments of the presently disclosed subject matter, an amplified and/or labeled nucleic acid sample is hybridized to specific probes or probe sets that are immobilized on a continuous solid support comprising a plurality of identifying positions. Representative formats of such solid supports are described herein.

Examples of hybridization and wash conditions that can be employed are known to those of skill in the art (see Sambrook & Russell, 2001; Ausubel et al., 2002; and Ausubel et al., 2003; each of which is incorporated herein in its entirety).

For some high-density glass-based microarray experiments, hybridization at 65° C. is too stringent for typical use, at least in part because the presence of fluorescent labels destabilizes the nucleic acid duplexes (Randolph & Waggoner, 1995). Alternatively, hybridization can be performed in a formamide-based hybridization buffer as described in Piétu et al., 1996.

A microarray format can be selected for use based on its suitability for electrochemical-enhanced hybridization. Provision of an electric current to the microarray, or to one or more discrete positions on the microarray facilitates localization of a target nucleic acid sample near probes immobilized on the microarray surface. Concentration of target nucleic acid near arrayed probe accelerates hybridization of a nucleic acid of the sample to a probe. Further, electronic stringency control allows the removal of unbound and nonspecifically bound DNA after hybridization. See U.S. Pat. No. 6,017,696 to Heller and U.S. Pat. No. 6,245,508 to Heller & Sosnowski.

V.E.3. Hybridization in Solution

In some embodiments of the presently disclosed subject matter, an amplified and/or labeled nucleic acid sample is hybridized to one or more probes in solution. Exemplary hybridization conditions are also disclosed in Sambrook & Russell, 2001; Ausubel et al., 2002; and Ausubel et al., 2003.

Alternate capture techniques can be used as will be understood to one of skill in the art, for example, purification by a metal affinity column when using probes comprising a histidine tag. As another example, the hybridized sample can be hydrolyzed by alkaline treatment wherein the double-stranded hybrids are protected while non-hybridizing single-stranded template and excess probe are hydrolyzed. The hybrids are then collected using any nucleic acid purification technique for further analysis.

To assess the expression of multiple genes and/or samples from multiple different sources simultaneously, probes or probe sets can be distinguished by differential labeling of probes or probe sets. Alternatively, probes or probe sets can be spatially separated in different hybridization vessels.

In some embodiments, a probe or probe set having a unique label is prepared for each gene or source to be detected. For example, a first probe or probe set can be labeled with a first fluorescent label, and a second probe or probe set can be labeled with a second fluorescent label. Multi-labeling experiments should consider label characteristics and detection techniques to optimize detection of each label. Representative first and second fluorescent labels are Cy3 and Cy5 (Amersham Pharmacia Biotech of Piscataway, N.J., United States of America), which can be analyzed with good contrast and minimal signal leakage.

A unique label for each probe or probe set can further comprise a labeled microsphere to which a probe or probe set is attached. A representative system is LabMAP (Luminex Corporation of Austin, Tex., United States of America). Briefly, LabMAP (Laboratory Multiple Analyte Profiling) technology involves performing molecular reactions, including hybridization reactions, on the surface of color-coded microscopic beads called microspheres. When used in accordance with the methods of the presently disclosed subject matter, an individual probe or probe set is attached to beads having a single color-code such that they can be identified throughout the assay. Successful hybridization is measured using a detectable label of the amplified nucleic acid sample, wherein the detectable label can be distinguished from each color-code used to identify individual microspheres. Following hybridization of the randomly amplified, labeled nucleic acid sample with a set of microspheres comprising probe sets, the hybridization mixture is analyzed to detect the signal of the color-code as well as the label of a sample nucleic acid bound to the microsphere. See Vignali 2000; Smith et al., 1998b; and PCT International Patent Application Publication Nos. WO 2001/013120; WO 2001/014589; WO 1999/019515; WO 1999/032660; and WO 1997/014028.

V.F. Detection

Methods and systems for detecting hybridization are typically selected according to the label employed.

In the case of a radioactive label (e.g., ³²P-dNTP) detection can be accomplished by autoradiography or by using a phosphorimager as is known to one of skill in the art. In some embodiments, a detection method can be automated and is adapted for simultaneous detection of numerous samples.

Common research equipment has been developed to perform high-throughput fluorescence detecting, including instruments from GSI Lumonics (Watertown, Mass., United States of America), Amersham Pharmacia Biotech/Molecular Dynamics (Sunnyvale, Calif., United States of America), Applied Precision Inc. (Issauah, Wash., United States of America), Genomic Solutions Inc. (Ann Arbor, Mich., United States of America), Genetic MicroSystems Inc. (Woburn, Mass., United States of America), Axon (Foster City, Calif., United States of America), Hewlett Packard (Palo Alto, Calif., United States of America), and Virtek (Woburn, Mass., United States of America). Most of the commercial systems use some form of scanning technology with photomultiplier tube detection. Criteria for consideration when analyzing fluorescent samples are summarized by Alexay et al., 1996.

In some embodiments, a nucleic acid sample or probe is labeled with far infrared, near infrared, or infrared fluorescent dyes. Following hybridization, the mixture of nucleic acids and probes is scanned photoelectrically with a laser diode and a sensor, wherein the laser scans with scanning light at a wavelength within the absorbance spectrum of the fluorescent label, and light is sensed at the emission wavelength of the label. See U.S. Pat. No. 6,086,737 to Patonay et al.; U.S. Pat. No. 5,571,388 to Patonay et al.; U.S. Pat. No. 5,346,603 to Middendorf & Brumbaugh; U.S. Pat. No. 5,534,125 to Middendorf et al.; U.S. Pat. No. 5,360,523 to Middendorf et al.; U.S. Pat. No. 5,230,781 to Middendorf & Patonay; U.S. Pat. No. 5,207,880 to Middendorf & Brumbaugh; and U.S. Pat. No. 4,729,947 to Middendorf & Brumbaugh. An ODYSSEY™ infrared imaging system (Li-Cor, Inc. of Lincoln, Nebr., United States of America) can be used for data collection and analysis.

If an epitope label has been used, a protein or compound that binds the epitope can be used to detect the epitope. For example, an enzyme-linked protein can be subsequently detected by development of a colorimetric or luminescent reaction product that is measurable using a spectrophotometer or luminometer, respectively.

In some embodiments, INVADER® technology (Third Wave Technologies of Madison, Wis., United States of America) is used to detect target nucleic acid/probe complexes. Briefly, a nucleic acid cleavage site (such as that recognized by a variety of enzymes having 5′ nuclease activity) is created on a target sequence, and the target sequence is cleaved in a site-specific manner, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. See U.S. Pat. No. 5,846,717 to Brow et al.; U.S. Pat. No. 5,985,557 to Prudent et al.; U.S. Pat. No. 5,994,069 to Hall et al.; U.S. Pat. No. 6,001,567 to Brow et al.; and U.S. Pat. No. 6,090,543 to Prudent et al.

In some embodiments, target nucleic acid/probe complexes are detected using an amplifying molecule, for example a poly-dA oligonucleotide as described by Lisle et al., 2001. Briefly, a tethered probe is employed against a target nucleic acid having a complementary nucleotide sequence. A target nucleic acid having a poly-dT sequence, which can be added to any nucleic acid sequence using methods known to one of skill in the art, hybridizes with an amplifying molecule comprising a poly-dA oligonucleotide. Short oligo-dT₄₀ signaling moieties are labeled with any suitable label (e.g., fluorescent, chemiluminescent, radioisotopic labels). The short oligo-dT₄₀ signaling moieties are subsequently hybridized along the molecule, and the label is detected.

The presently disclosed subject matter also envisions use of electrochemical technology for detecting a nucleic acid hybrid according to the disclosed method. In this case, the detection method relies on the inherent properties of DNA, and thus a detectable label on the target sample or the probe/probe set is not required. In some embodiments, probe-coupled electrodes are multiplexed to simultaneously detect multiple genes using any suitable microarray or multiplexed liquid hybridization format. To enable detection, gene-specific and control probes are synthesized with substitution of the non-physiological nucleic acid base inosine for guanine, and subsequently coupled to an electrode. Following hybridization of a nucleic acid sample with probe-coupled electrodes, a soluble redox-active mediator (e.g., ruthenium 2,2′-bipyridine) is added, and a potential is applied to the sample. In the absence of guanine, each mediator is oxidized only once. However, when a guanine-containing nucleic acid is present, by virtue of hybridization of a sample nucleic acid molecule to the probe, a catalytic cycle is created that results in the oxidation of guanine and a measurable current enhancement. See U.S. Pat. No. 6,127,127 to Eckhardt et al.; U.S. Pat. No. 5,968,745 to Thorp et al.; and U.S. Pat. No. 5,871,918 to Thorp et al.

Surface plasmon resonance spectroscopy can also be used to detect hybridization. See e.g., Heaton et al., 2001; Nelson et al., 2001; and Guedon et al., 2000.

V.G. Data Analysis

Databases and software designed for use with microarrays is discussed in U.S. Pat. No. 6,229,911 to Balaban & Aggarwal, a computer-implemented method for managing information, stored as indexed tables, collected from small or large numbers of microarrays, and U.S. Pat. No. 6,185,561 to Balaban & Khurgin, a computer-based method with data mining capability for collecting gene expression level data, adding additional attributes and reformatting the data to produce answers to various queries. U.S. Pat. No. 5,974,164 to Chee, disclose a software-based method for identifying mutations in a nucleic acid sequence based on differences in probe fluorescence intensities between wild type and mutant sequences that hybridize to reference sequences.

Analysis of microarray data can also be performed using the method disclosed in Tusher et al., 2001, which describes the Significance Analysis of Microarrays (SAM) method for determining significant differences in gene expression among two or more samples.

VI. Devices, Systems, and Compositions for Use in the Presently Disclosed Methods

The presently disclosed subject matter also provides devices, systems, and compositions that can be employed in the practice of the methods disclosed herein.

The methods and systems disclosed herein relate in some embodiments to generating gene expression profiles from biological samples that comprise PDAC cells obtained from a subject. The gene expression profiles are then in some embodiments compared to standards such as, but not limited to gene expression profiles of metastatic PDAC cells and/or primary (i.e., non-metastatic) PDAC cells.

As such, the presently disclosed methods can employ various techniques to generate the gene expression profiles required for the comparisons. See e.g., PCT International Patent Application Publication Nos. WO 2004/046098; WO 2004/110244; WO 2006/089268; WO 2007/001324; WO 2007/056332; WO 2007/070252, each of which is incorporated herein by reference in its entirety.

Generally, a gene expression profile can be generated using the following basic steps:

-   -   (1) a biological sample such as, but not limited to a PDAC         biopsy or resected PDAC cells are obtained; and     -   (2) the expression levels of one or more (e.g., 1, 2, 3, 4, 5,         6, 7, 8, 9, 10, 20, 25, 50, 100, or all) of the genes listed in         Tables 2-5 are determined.

As is known to one of ordinary skill in the art, gene expression levels can be assayed either at the level of RNA or at the level of protein. As such, in some embodiments RNA is extracted from the biological sample and analyzed by techniques that include, but are not limited to PCR analysis (in some embodiments, quantitative reverse transcription PCR) and/or array analysis. In each case, one of ordinary skill in the art would be aware of techniques that can be employed to determine the expression level of a gene product in the biological sample.

With respect to PCR analyses, the sequences of nucleic acids that correspond to one or more of the genes listed in Tables 2-5 are present within the GENBANK® biosequence database, and oligonucleotide primers can be designed for the purpose of determining expression levels.

Alternatively, arrays can be produced that include single-stranded nucleic acids that can hybridize to nucleic acids derived from one or more of the genes listed in Tables 2-5. Exemplary, non-limiting methods that can be used to produce and screen arrays are described herein above.

Therefore, in some embodiments the presently disclosed subject matter provides arrays comprising polynucleotides that are capable of hybridizing to one or more up to all of the genes listed in Tables 2-5 and/or comprising specific peptide or polypeptide gene products of one or more up to all of the genes listed in Tables 2-5.

Alternatively or in addition, gene expression can be assayed by determining the levels at which polypeptides are present in PDAC tissue. This can also be done using arrays, and exemplary methods for producing peptide and/or polypeptide arrays attached to nitrocellulose-coated glass slides (Espejo et al., 2002), alkanethiol-coated gold surfaces (Houseman et al., 2002), poly-L-lysine-treated glass slides (Haab et al., 2001), aldehyde-treated glass slides (MacBeath & Schreiber, 2000; Salisbury et al., 2002), silane-modified glass slides (Fang et al., 2002; Seong, 2002), and nickel-treated glass slides (Zhu et al., 2001), among others, have been reported.

In some embodiments, the presently disclosed subject matter provides arrays that comprise peptides or polypeptides that are correspond to one or more up to all of the genes listed in Tables 2-5. In these embodiments, arrays are produced from proteins isolated from PDAC tissue, and these arrays are then probed with molecules that specifically bind to the various gene products of interest, if present. Exemplary molecules that specifically bind to one or more up to all of the genes listed in Tables 2-5 include antibodies (as well as fragments and derivatives thereof that include at least one Fab fragment). Antibodies to many of the polypeptides that correspond to the genes listed in Tables 2-5 are commercially available, and antibodies that specifically bind to gene products that are not commercially available can be produced using routine techniques.

Peptide and/or polypeptide arrays can be designed quantitatively such that the amount of each individual peptide or polypeptide is reflective of the amount of that individual peptide or polypeptide in the PDAC tissue.

Further, the arrays can be designed such that specific peptide or polypeptide gene products that correspond to one or more of the genes listed in Tables 2-5 can be localized (sometimes referred to as “spotted”) on the array such that the array can be interrogated with at least one antibody that specifically binds to one of the specific peptide or polypeptide gene products.

In some embodiments, gene expression at the level of protein is assayed without isolating the relevant peptides and/or polypeptides from the PDAC cells. For example, immunohistochemistry and/or immunocytochemistry can be employed, in which the expression levels of gene products that correspond to one or more of the genes listed in Tables 2-5 can be determined by incubating appropriate binding molecules to PDAC cells and/or tissue. In some embodiments, the PDAC cells and/or tissue is mounted in paraffin blocks before the immunohistochemistry and/or immunocytochemistry is performed.

As would be understood by one of ordinary skill in the art upon consideration of the present disclosure, many of the manipulations disclosed herein can be automated, and it is intended that such automation is encompassed by the presently disclosed subject matter.

EXAMPLES

The following Examples provide further illustrative embodiments. In light of the present disclosure and the general level of skill in the art, those of skill will appreciate that the following Example is intended to be exemplary only and that numerous changes, modifications, and alterations can be employed without departing from the scope of the presently disclosed subject matter.

TABLE 2 Exemplary Genes Associated with Activated Stroma Subtype and Exemplary Chemotherapeutics Applicable Thereto Gene symbol Possible drug(s) ANXA1 Hydrocortisone, hydrocortisone/prednisone, hydrocortisone/ mitoxantrone AOC3 Hydralazine, hydralazine/ hydrochlorothiazide/reserpine, hydralazine/hydrochlorothiazide, hydralazine/isosorbide dinitrate APP Bapineuzumab, florbetapir F18, florbetaben F ATP1A1 Digoxin, trichloromethiazide, ciclopirox olamine, ethacrynic acid, reserpine/trichloromethiazide, bretylium, perphenazine, ouabain, digitoxin AXL Cabozantinib, cabozantinib/erlotinib BDKRB2 Anatibant, icatibant C1S SERPING1 CCR5 Maraviroc, vicriviroc, ancriviroc CD52 Alemtuzumab, alemtuzumab/cyclosporin A, alemtuzumab/cyclophosphamide/ fludarabine phosphate/rituximab, alemtuzumab/fludarabine phosphate, alemtuzumab/rituximab, alemtuzumab/cyclophosphamide/ fludarabine phosphate/mitoxantrone, alemtuzumab/pentostatin, alemtuzumab/bendamustine CFTR Crofelemer, ivacaftor COL10A1 Collagenase clostridium histolyticum COL11A1 Collagenase clostridium histolyticum COL12A1 Collagenase clostridium histolyticum COL16A1 Collagenase clostridium histolyticum COL1A1 Collagenase clostridium histolyticum COL1A2 Collagenase clostridium histolyticum COL3A1 Collagenase clostridium histolyticum COL4A2 Collagenase clostridium histolyticum COL5A1 Collagenase clostridium histolyticum COL5A2 Collagenase clostridium histolyticum COL8A1 Collagenase clostridium histolyticum COL8A2 Collagenase clostridium histolyticum CSF1R Nilotinib, sunitinib, pazopanib CXCR4 Cladribine/cytarabine/filgrastim/idarubicin/ plerixafor, plerixafor EDNRA Bosentan, avosentan, clazosentan, ambrisentan, sitaxsentan, zibotentan, SB 234551, TBC 3214, BSF 302146, macitentan, fandosentan, atrasentan EPCAM Tucotuzumab celmoleukin, catumaxomab, adecatumumab ERBB2 Trastuzumab, BMS-599626, varlitinib, XL647, CP-724,714, afatinib, pertuzumab, sapitinib, trastuzumab emtansine, lapatinib/pazopanib, lapatinib/letrozole, paclitaxel/trastuzumab, capecitabine/lapatinib, cyclophosphamide/docetaxel/epirubicin/5-fluorouracil/trastuzumab, docetaxel/trastuzumab, paclitaxel/pertuzumab/trastuzumab, trastuzumab/vinorelbine, capecitabine/trastuzumab, lapatinib/paclitaxel, pertuzumab/trastuzumab, lapatinib/trastuzumab, neratinib, lap atinib, erlotinib F2R Chrysalin, argatroban, bivalirudin FCGR1B IgG FCGR2A IgG FN1 Ocriplasmin FYN Dasatinib GABRP Alphadolone, nitrazepam, adinazolam, sevoflurane, isoflurane, isoniazid, felbamate, etomidate, halothane, fluoxetine/olanzapine, estazolam, eszopiclone, quazepam, diazepam, temazepam, zolpidem, lorazepam, olanzapine, triazolam, flurazepam, midazolam, oxazepam, zaleplon, secobarbital, phenobarbital, pentobarbital, desflurane, methoxyflurane, enflurane HLA- Apolizumab DRB1 IL1R1 Anakinra ITGAV Abciximab, CNTO 95, cilengitide ITGB5 Cilengitide KCNJ8 Gliquidone, thiamylal KCNN4 Betamethasone/clotrimazole, clotrimazole, senicapoc KCNQ1 Dextromethorphan/quinidine, indapamide, quinidine KIT Dasatinib, sunitinib, pazopanib, tivozanib, motesanib, OSI-930, telatinib, tandutinib, cabozantinib, regorafenib, ponatinib, bortezomib/sorafenib, lapatinib/pazopanib, dexamethasone/lenalidomide/sorafenib, bevacizumab/sorafenib, imatinib/sirolimus, cabozantinib/erlotinib, imatinib, sorafenib MET Crizotinib, tivantinib, cabozantinib, INC280, cabozantinib/erlotinib MMP11 Marimastat MMP7 Marimastat MUC1 HuHMFG1 NNMT Atorvastatin/niacin, nicotinic acid/pioglitazone, nicotinic acid, lovastatin/niacin PDGFRA Sunitinib, pazopanib, axitinib, telatinib, regorafenib, lapatinib/pazopanib, imatinib/sirolimus, imatinib, becaplermin PDGFRB Nilotinib, dasatinib, sunitinib, pazopanib, axitinib, tivozanib, tandutinib, regorafenib, bortezomib/sorafenib, lapatinib/pazopanib, dexamethasone/lenalidomide/sorafenib, bevacizumab/sorafenib, imatinib/sirolimus, imatinib, sorafenib, becaplermin PLA2G7 Darapladib PLAT 6-aminocaproic acid PTGER2 Misoprostol, prostaglandin E2, prostaglandin E1, CP 533536, diclofenac/misoprostol RAMP1 Pramlintide SLC12A2 Bumetanide, quinethazone TEK Cabozantinib, regorafenib, ponatinib, cabozantinib/erlotinib, vandetanib TLR4 Resatorvid TLR7 UC-1V150, 5-fluorouracil/imiquimod, resiquimod, hydroxychloroquine, imiquimod

TABLE 3 Exemplary Genes Associated with Normal Stroma Subtype and Exemplary Chemotherapeutics Applicable Thereto Gene symbol Possible Drug(s) ACE2 Hydrochlorothiazide/lisinopril, hydrochlorothiazide/moexipril, moexipril, lisinopril ADH1A Caffeine/ethanol, 4-methylpyrazole (Fomepizole), ethanol ADH1C Caffeine/ethanol, 4-methylpyrazole (Fomepizole), ethanol ADRB2 Articaine/epinephrine, bupivacaine/epinephrine, carteolol, dipivefrin, meluadrine, epinephrine/prilocaine, epinephrine/lidocaine, bedoradrine, KUL 7211, arformoterol, indacaterol, myogane, budesonide/formoterol, nebivolol, vilanterol, olodaterol, formoterol/mometasone furoate, glycopyrrolate/indacaterol, fluticasone furoate/vilanterol, latanoprost/timolol, umeclidinium/vilanterol, fluticasone/salmeterol, albuterol/ipratropium, isoprenaline, carvedilol, ephedrine, guanethidine, levalbuterol, propranolol, pindolol, esmolol, metoprolol, alprenolol, salmeterol, dorzolamide/timolol, fluoxetine/olanzapine, guanadrel, bendroflumethiazide/nadolol, isoxsuprine, hydrochlorothiazide/propranolol, hydrochlorothiazide/timolol, isoproterenol, sotalol, bambuterol, nadolol, timolol, isoetharine, ritodrine, olanzapine, venlafaxine, labetalol, formoterol, bitolterol, albuterol, terbutaline, procaterol, pirbuterol, clenbuterol, fenoterol, norepinephrine, metaproterenol sulfate, epinephrine, dobutamine, droxidopa, arbutamine AGTR1 Amlodipine/olmesartan medoxomil, olmesartan, amlodipine/hydrochlorothiazide/valsartan, amlodipine/telmisartan, aliskiren/valsartan, azilsartan kamedoxomil, amlodipine/hydrochlorothiazide/olmesartan medoxomil, aspirin/dipyridamole/telmisartan, clopidogrel/telmisartan, amlodipine/valsartan, hydrochlorothiazide/losartan, hydrochlorothiazide/valsartan, candesartan, candesartan cilexetil, olmesartan medoxomil, irbesartan, losartan potassium, telmisartan, eprosartan, candesartan cilexetil/hydrochlorothiazide, hydrochlorothiazide/irbesartan, eprosartan/hydrochlorothiazide, hydrochlorothiazide/telmisartan, hydrochlorothiazide/olmesartan medoxomil, valsartan ANXA1 Hydrocortisone, hydrocortisone/prednisone, hydrocortisone/mitoxantrone AOC3 Hydralazine, hydralazine/hydrochloro-thiazide/ reserpine; hydralazine/ hydrochlorothiazide; hydralazine/ isosorbide dinitrate APP Bapineuzumab, florbetapir F18, florbetaben F ATP1A1 Digoxin, trichloromethiazide, ciclopirox olamine, ethacrynic acid, reserpine/trichloromethiazide, bretylium, perphenazine, ouabain, digitoxin ATP1A2 Digoxin, ethacrynic acid, perphenazine AXL Cabozantinib, cabozantinib/erlotinib BDKRB2 Anatibant, icatibant C1S Serpin peptidase inhibitor (SERPING1) CNR1 Trans-(Â±)-nabilone, SLV 319, rimonabant, BAY 38-7271, delta-8- tetrahydrocannabinol, delta-9-tetrahydrocannabinol CSF1R Nilotinib, sunitinib, pazopanib CXCR4 Cladribine/cytarabine/filgrastim/idarubicin/plerixafor, plerixafor ERBB2 Trastuzumab, BMS-599626, varlitinib, XL647, CP-724, 714, afatinib, pertuzumab, sapitinib, trastuzumab emtansine, lapatinib/pazopanib, lapatinib/letrozole, paclitaxel/trastuzumab, capecitabine/lapatinib, cyclophosphamide/docetaxel/epirubicin/5-fluorouracil/trastuzumab, docetaxel/trastuzumab, paclitaxel/pertuzumab/trastuzumab, trastuzumab/vinorelbine, capecitabine/trastuzumab, lapatinib/paclitaxel, pertuzumab/trastuzumab, lapatinib/trastuzumab, neratinib, lapatinib, erlotinib FYN Dasatinib GHR GH1, pegvisomant, somatrem HBB Iron dextran HLA-DRB1 Apolizumab IL1R1 Anakinra ITGAV Abciximab, CNTO 95, cilengitide ITGB5 Cilengitide KCNJ8 Gliquidone, thiamylal KCNK1 KCNMB4 Tedisamil KIT Dasatinib, sunitinib, pazopanib, tivozanib, motesanib, OSI-930, telatinib, tandutinib, cabozantinib, regorafenib, ponatinib, bortezomib/sorafenib, lapatinib/pazopanib, dexamethasone/lenalidomide/sorafenib, bevacizumab/sorafenib, imatinib/sirolimus, cabozantinib/erlotinib, imatinib, sorafenib LEPR Recombinant-methionyl human leptin LPL Atorvastatin/niacin, nicotinic acid/pioglitazone, nicotinic acid, tyloxapol, lovastatin/niacin PDGFRA Sunitinib, pazopanib, axitinib, telatinib, regorafenib, lapatinib/pazopanib, imatinib/sirolimus, imatinib, becaplermin PDGFRB Nilotinib, dasatinib, sunitinib, pazopanib, axitinib, tivozanib, tandutinib, regorafenib, bortezomib/sorafenib, lapatinib/pazopanib, dexamethasone/lenalidomide/sorafenib, bevacizumab/sorafenib, imatinib/sirolimus, imatinib, sorafenib, becaplermin PLA2G2A Varespladib methyl, varespladib, indomethacin RAMP1 Pramlinti de RAMP3 Pramlintide S1PR1 Fingolimod SCN7A Riluzole TEK Cabozantinib, regorafenib, ponatinib, cabozantinib/erlotinib, vandetanib

TABLE 4 Exemplary Genes Associated with Basal Subtype and Exemplary Chemotherapeutics Applicable Thereto Gene symbol Possible drug(s) ADORA2B Adenosine, enprofylline, dyphylline, aspirin/butalbital/caffeine, acetaminophen/caffeine/ dihydrocodeine, acetaminophen/ aspirin/caffeine, caffeine/ergotamine, aspirin/caffeine/propoxyphene, aspirin/butalbital/caffeine/codeine, aspirin/caffeine/dihydrocodeine, acetaminophen/butalbital/caffeine, aminophylline, aspirin/caffeine/ orphenadrine, acetaminophen/ butalbital/caffeine/codeine, theophylline, caffeine, acetaminophen/caffeine/chlorpheniramine/hydrocodone/phenylephrine ANXA1 Hydrocortisone, hydrocortisone/ prednisone, hydrocortisone/ mitoxantrone ATP1A1 Digoxin, trichloromethiazide, ciclopirox olamine, ethacrynic acid, reserpine/trichloromethiazide, bretylium, perphenazine, ouabain, digitoxin AXL Cabozantinib, cabozantinib/erlotinib BDKRB2 Anatibant, icatibant COL17A1 Collagenase clostridium histolyticum DDR1 Nilotinib EGFR Cetuximab, AEE 788, panitumumab, BMS-599626, varlitinib, XL647, bevacizumab/erlotinib, afatinib, sapitinib, cetuximab/irinotecan, lapatinib/pazopanib, irinotecan/panitumumab, erlotinib/vismodegib, erlotinib/gemcitabine, lapatinib/letrozole, capecitabine/lapatinib, bevacizumab/panitumumab, bevacizumab/cetuximab, capecitabine/erlotinib, lapatinib/paclitaxel, cabozantinib/erlotinib, lapatinib/trastuzumab, canertinib, gefitinib, neratinib, PD 153035, lapatinib, vandetanib, erlotinib EPCAM Tucotuzumab celmoleukin, catumaxomab, adecatumumab ERBB2 Trastuzumab, BMS-599626, varlitinib, XL647, CP-724,714, afatinib, pertuzumab, sapitinib, trastuzumab emtansine, lapatinib/pazopanib, lapatinib/letrozole, paclitaxel/trastuzumab, capecitabine/lapatinib, cyclophosphamide/docetaxel/epirubicin/5-fluorouracil/trastuzumab, docetaxel/trastuzumab, paclitaxel/pertuzumab/trastuzumab, trastuzumab/vinorelbine, capecitabine/trastuzumab, lapatinib/paclitaxel, pertuzumab/trastuzumab, lapatinib/trastuzumab, neratinib, lapatinib, erlotinib GABRP Alphadolone, nitrazepam, adinazolam, sevoflurane, isoflurane, isoniazid, felbamate, etomidate, halothane, fluoxetine/olanzapine, estazolam, eszopiclone, quazepam, diazepam, temazepam, zolpidem, lorazepam, olanzapine, triazolam, flurazepam, midazolam, oxazepam, zaleplon, secobarbital, phenobarbital, pentobarbital, desflurane, methoxyflurane, enflurane IFNAR1 Interferon alfacon-1, PEG-interferon alfa-2a, interferon beta-1a, recombinant interferon, PEG-interferon alfa-2a/telaprevir, pegintron/ribavirin, interferon alfa-n1, PEG-interferon alfa- 2a/ribavirin, IFNA2, hydroxyurea/recombinant interferon, interferon alfa-2b/ribavirin, pegintron, interferon beta-1b ITGAV Abciximab, CNTO 95, cilengitide ITGB5 Cilengitide KCNMB4 Tedisamil KCNN4 Betamethasone/clotrimazole, clotrimazole, senicapoc MET Crizotinib, tivantinib, cabozantinib, INC280, cabozantinib/erlotinib MMP7 Marimastat MST1R Crizotinib MUC1 HuHMFG1 NOXO1 Ecabet P2RY2 Suramin PLAT 6-aminocaproic acid PSCA AGS-1C4D4 PTK6 Vandetanib RAMP1 Pramlintide SCNN1A Hydrochlorothiazide/triamterene, amiloride, amiloride/ hydrochloro- thiazide, triamterene

TABLE 5 Exemplary Genes Associated with Classical Subtype and Exemplary Chemotherapeutic Applicable Thereto Gene symbol Possible drug(s) ACE2 Hydrochlorothiazide/lisinopril, hydrochlorothiazide/moexipril, moexipril, lisinopril ATP1A1 Digoxin, trichloromethiazide, ciclopirox olamine, ethacrynic acid, reserpine/trichloromethiazide, bretylium, perphenazine, ouabain, digitoxin BDKRB2 Anatibant, icatibant CFTR Crofelemer, ivacaftor CYP3A4 Cobicistat, cobicistat/elvitegravir/emtricitabine/ tenofovir disoproxil, ketoconazole CYP3A7 Cobicistat, cobicistat/elvitegravir/emtricitabine/ tenofovir disoproxil DDR1 Nilotinib EPCAM Tucotuzumab celmoleukin, catumaxomab, adecatumumab ERBB2 Trastuzumab, BMS-599626, varlitinib, XL647, CP-724,714, afatinib, pertuzumab, sapitinib, trastuzumab emtansine, lapatinib/pazopanib, lapatinib/letrozole, paclitaxel/trastuzumab, capecitabine/lapatinib, cyclophosphamide/docetaxel/epirubicin/5-fluorouracil/trastuzumab, docetaxel/trastuzumab, paclitaxel/pertuzumab/trastuzumab, trastuzumab/vinorelbine, capecitabine/trastuzumab, lapatinib/paclitaxel, pertuzumab/trastuzumab, lapatinib/trastuzumab, neratinib, lapatinib, erlotinib F5 Drotrecogin alfa, antithrombin alfa GABRP Alphadolone, nitrazepam, adinazolam, sevoflurane, isoflurane, isoniazid, felbamate, etomidate, halothane, fluoxetine/olanzapine, estazolam, eszopiclone, quazepam, diazepam, temazepam, zolpidem, lorazepam, olanzapine, triazolam, flurazepam, midazolam, oxazepam, zaleplon, secobarbital, phenobarbital, pentobarbital, desflurane, methoxyflurane, enflurane HLA- Apolizumab DRB1 ITGB5 Cilengitide KCNN4 Betamethasone/clotrimazole, clotrimazole, senicapoc KCNQ1 Dextromethorphan/quinidine, indapamide, quinidine MET Crizotinib, tivantinib, cabozantinib, INC280, cabozantinib/erlotinib MMP7 Marimastat MST1R Crizotinib MUC1 HuHMFG1 NOXO1 Ecabet P2RY2 Suramin PLA2G10 Varespladib methyl, varespladib PSCA AGS-1C4D4 RAMP1 Pramlintide SCNN1A Hydrochlorothiazide/triamterene, amiloride, amiloride/ hydrochlorothiazide, triamterene SLC12A2 Bumetanide, quinethazone

TABLE 6 Exemplary Kinases as Therapeutic Targets for Classical and Basal Subtype tumors Gene Name Description Subtype CDK1 Cyclin-dependent kinase 1 Basal CDK6 Cyclin-dependent kinase 6 Basal EPHA1 Ephrin type-A receptor 1 Basal EPHB2 Ephrin type-B receptor 2 Basal KAPCA cAMP-dependent protein kinase catalytic Classical subunit alpha KAPCB cAMP-dependent protein kinase catalytic Classical subunit beta KCC2D Calcium/calmodulin-dependent protein kinase Classical type II subunit delta KGP1 cGMP-dependent protein kinase 1 Classical LIMK1 LIM domain kinase 1 Basal PGFRB Platelet-derived growth factor receptor beta Classical RIPK2 Receptor-interacting serine/threonine-protein Basal kinase 2

By applying a computational approach to a large cohort of data, the presently disclosed subject matter overcame the low cellularity problem and generated new insights into the complex molecular composition of PDAC. The results disclosed herein and their prognostic values can thus provide decision support in a clinical setting for the choice and timing of treatment regimens.

Co-expression of stromal gene signatures was largely conserved across other large primary tumor datasets (The Cancer Genome Atlas Research Network, 2014a,b; Nones et al., 2014). Co-expression was particularly high in lung adenocarcinoma (The Cancer Genome Atlas Research Network, 2012b), which was previously shown to be low in purity (Carter et al., 2012) and high in stromal content (Yoshihara et al., 2013). Both expression and co-expression was low in primary acute myeloid leukemia (The Cancer Genome Atlas Research Network, 2013c), which lacks stroma.

Materials and Methods for Examples 1-8

Decomposition by factors and gene ranking. For all analyses in this manuscript, we used k=14 as the number of factors. Unsupervised NMF was performed on a gene-by-sample matrix X first with 20 randomly initialized instances of NMF using the MATLAB (MathWorks R2013a) multiplicative update NMF solver for 10 steps. The lowest-residual solution pair from these 20 instances was then used to seed NMF of X to convergence with the alternating least-squares solver. The result was a matrix of gene loadings, G, and a matrix of sample loadings, S. G and S were then scaled such that the mean of each column of G was 1 to facilitate cross-factor comparisons.

For each of the k factors, a set of distinct exemplar genes for the i^(th) factor was established by ranking genes in descending order of the difference between the loading value in the i^(th) column of matrix G and the largest loading value not in the i^(th) column of matrix G.

200 iterations of 5-fold resampling, i.e. training on a partition of approximately 80% of the samples, were performed to achieve stable NMF results. For each of these 200 data partitions, unsupervised NMF was performed, and the genes which appeared ranked in the top 50 of any factor together were recorded in a gene by gene consensus matrix. This gene factor-co-occurrence-consensus matrix was then used as the basis of a hierarchical clustering operation using correlation as a distance metric and an appropriate cutoff as to yield k gene clusters. These k gene-clusters were used to create a seed matrix, G₀ such that the i^(th) column of G₀ contained 0.01 for all genes except those in gene cluster i, which were set to 1. G₀ was then used to seed a final NMF using the multiplicative update solver to completion.

Gene set analysis was performed on the ranked list of genes for each factor with all sets available from MSigDB v3.1 (Subramanian et al., 2005). Sets were assessed for significance via Kolmogorov-Smirnov statistic with Benjamini-Hochberg correction. Due to the positive nature of the ranked gene list, only gene sets with positive enrichment were considered.

Patients and Samples. Multiple samples were obtained from 15 patients with metastatic PDAC from the University of Nebraska Medical Center Rapid Autopsy Pancreatic Program, and 17 patients from Johns Hopkins Medical Institutions and the Johns Hopkins Gastrointestinal Cancer Rapid Medical Donation Program. Informed consent was obtained from all subjects. To ensure minimal degradation of tissue, organs were harvested within 3 hours postmortem and the specimens flash frozen in liquid nitrogen. The cohort further included patients with resected PDAC and/or normal tissue from Johns Hopkins Medical Institutions, Northwestern Memorial Hospital, NorthShore Hospital, and the University of North Carolina (UNC) hospitals. All samples were collected between 1999 and 2009, flash frozen in liquid nitrogen at the time of operation after approval by each individual IRB. The UNC IRB approved use of all de-identified samples for this study. Some of these samples were previously published using a different normalization procedure as part of GSE21501 (Garrido-Laguna et al., 2011). All available samples were reviewed by a single pathologist (KEV).

The microarray cohort employed herein consisted of 145 primary (125 with survival data) and 61 metastatic PDAC tumors, 17 cell lines, 47 pancreas and 89 distant site adjacent normal samples, providing a rare diversity of tissue types with which to train our model. This data set represents an expansion from the 106 primary tumors in the previously published cohort GSE21501 (Garrido-Laguna et al., 2011) which was a bulk analysis of gene expression confined to primary tumors. The BxPC-3, MIA PaCa-2, HPAC, Panc 02.03, SW1990, HPAF-II, CFPAC-1, PANC-1, Capan-1, Capan-2, Panc 10.05, Hs 766T, Panc 03.27, and T3M4 PDAC cell lines were obtained from ATCC (Manassas, Va., United States of America). HuPT3 cells (obtained from Dan Billadeau, Mayo Clinic, Rochester, Minn., United States of America) and the immortalized human pancreatic duct-derived (HPNE) cells were described previously (Neel et al., 2014). All cell lines were authenticated via short tandem repeat profiling (Genetica), and all cell lines were mycoplasma negative by indirect staining. For survival analysis, only data from patients with localized resected tumors were used. RNA sequencing was performed on an additional 15 primary tumors, 37 pancreatic cancer patient-derived xenografts (PDX), 3 cell lines (HuPT3 plus 2 PDX-derived), and 6 cancer associated fibroblast (CAF) lines derived from deidentified patients with pancreatic cancer. Expression data have been uploaded to GEO.

PDX and derived cells. Fresh tumor samples from deidentified pancreatic ductal adenocarcinoma patients were obtained under protocols approved by the UNC IRB. All patient tissues were stained with hematoxylin and eosin (H&E) to confirm histology. The tumors were implanted subcutaneously into the flanks of 6-8 week old female NSG or NOD/SCID mice and subsequently passaged into other mice under protocols approved by the Institutional Animal Care and Use Committee.

Cell lines were derived from PDX as follows. At the time of passage, a section of the tumor was cut into approximately 3 mm pieces and rinsed with PBS containing penicillin and streptomycin (P/S). The tissue was minced with the GENTLEMACS™ Dissociator (Miltenyi Biotec) and incubated for 30 minutes in a Collagenase/Dispase (Roche 11097113001) solution. After incubation, mincing was repeated, the dissociation media was removed and the tissue was resuspended in DMEM/F12 media with 5 ng/ml EGF, 10 μg/ml insulin (Life Technologies, 11330-032, PHG0311 and 12585-014 respectively), 10% FBS and 1×P/S and seeded onto tissue culture treated plates. Once culture was established, differential trypsinization was used to remove the fibroblasts and the cells were seeded on gelatin coated glass coverslips for immunofluorescence confirmation. Epithelial tumor cells were confirmed based on their expression of cytokeratin 18 or 19 and EpCAM (using Abcam ab133302, ab76539 and BioLegend 324209 antibodies).

Primary CAF cell lines from tumors of patients with PDAC were isolated using the outgrowth method as follows (Bachem et al., 2005). Fresh tumor was minced into pieces no larger than 1 mm³ and cultured with DMEM/Ham's F12 (1:1) media supplemented with 10% FBS. Immunofluorescence was used to confirm the presence of CAFs as defined by the presence of smooth muscle actin alpha (SMAα Santa Cruz Biotechnology 32251) and a mesenchymal marker, vimentin, (Cell Signaling 5741) as well as the absence of an epithelial marker, EpCAM (BioLegend 324209).

Statistical Analysis. For all analyses, sample size was limited to all appropriate cases with full data (i.e., no imputation was performed to estimate missing clinical information). Disease-specific survival or recurrence free survival was analyzed using the Kaplan-Meier product-limit method and the significance of clinicopathologic or subtype variables were measured by Cox proportional hazards regression. Multi-variable associations with survival were also performed using the Cox proportional hazards regression method. When more than 2 survival cohorts were compared, the log-rank test was used to assess global differences in survival. Fisher's exact test was used to analyze associations between 2 categorical variables. For continuous variables, e.g. stain intensity, factor weights, unpaired two-tailed two-sample t-tests were performed under the equal variance assumption. Box and whiskers plots show median, quartiles and range of continuous data to demonstrate variability of data and demonstrate degree of normality. Unless otherwise mentioned, sample to sample or gene to gene similarities were measured by correlation based on log₂ transformed gene expression after normalizing each gene's expression to have a mean of zero and variance of one. Unless otherwise noted, clustering was done via consensus clustering of row-normalized gene expression. Consensus clustering consisted of 1000 iterations of k-means clustering, with 50% feature hold-out at each iteration, followed by hierarchical clustering of the consensus matrix with average linkage.

Microarray Data. All RNA isolation and hybridization was performed at UNC on Agilent human whole genome 4x44K microarrays (Agilent Technologies). RNA was extracted from macrodissected snap-frozen tumor samples using Allprep Kits (Qiagen) and quantified using nanodrop spectrophotometry (ThermoScientific). RNA quality was assessed with the use of the Bioanalyzer 2100 (Agilent Technologies). RNA was selected for hybridization using RNA integrity number and by inspection of the 18S and 28S ribosomal RNA. Similar RNA quality was selected across samples. One microgram of RNA was used as a template for cDNA preparations. cDNA was labeled with Cy5-dUTP and a reference control (Stratagene) was labeled with Cy3-dUTP using the Agilent low RNA input linear amplification kit (Agilent Technologies) and hybridized overnight at 65uC to Agilent 4x44 K whole human genome arrays (Agilent Technologies). Arrays were washed and scanned using an Agilent scanner (Agilent Technologies).

Arrays were annotated using GEO platform GPL4133, and analyzed using log₂ background corrected Cy5 signal to maintain positivity. Multiple probes mapping to the same gene symbol were collapsed by mean probe expression. Samples were normalized to each other via quantile normalization.

RNAseq. 200-1000 ng of total RNA was used to prepare libraries with the TruSeq Stranded mRNA Sample Prep Kit (Illumina). 75b paired-end reads were sequenced on a NextSeq 500 Desktop Sequencer using a high output flow cell kit (Illumina). Reads were separated by species of origin using Xenome (Conway et al., 2012). Human or mouse specific reads were then aligned and quantified using Tophat2 (Kim et al., 2013), Cufflinks (Trapnell et al., 2012), hg19, mm10, and the UCSC knownGene transcript and gene definitions (<<genome>><<.>>ucsc<<.>>edu). mRNA gene expression was analyzed as log₂(1+FPKM), and KRAS mutation status was determined by manual curation of aligned human reads.

Validation Data Sets. Gene expression array data from resected primary tumor samples from the Australian Pancreatic Cancer Genome Initiative and International Cancer Genome Consortium (ICGC) data were obtained from GSE50827 (Biton et al., 2014). Associated open access clinical data were obtained from the ICGC data portal: <<http>>://<<dcc>>.<<icgc>>.<<org>>/release_16. Patients with death events before 30 days were assumed to have postoperative complications and were censored. Patients with metastases were excluded from survival analyses. Genomic subtypes, mutations, and amplifications were obtained from supplemental materials available from Waddell et al., 2015.

Normalized gene expression, survival data, and PAM50 (Stolze et al., 2015) classification from primary breast cancer (Perou) samples (n=295) as part of the UNC337 set were obtained from GSE18229 (Dal Molin et al., 2015).

Normalized RNAseq expression data of 845 primary tumor data were obtained as described by Hoadley et al., 2014 from TCGA <<https>>://<<tcga-data>>.<<nci>>.<<nih>>.<<gov/tcga>> (Zhong et al., 2015),

Normalized RNAseq gene expression and partial survival data from 223 urothelial bladder carcinoma (BLCA) samples were obtained from TCGA (<<https>>://<<tcga-data>>.<<nci>>.<<nih>>.<<gov/tcga>>)<Alexandrov et al., 2013b). Samples were classified as basal or luminal with BASE47 classifications provided by Damrauer et al. (Isella et al., 2015).

Example 1 Virtual Microdissection of PDAC

Gene expression in a cohort of microarray data from 145 primary and 61 metastatic PDAC tumors, 17 cell lines, 47 pancreas and 89 distant site adjacent normal samples were analyzed using Agilent (Agilent Technologies) human whole genome 4x44K DNA microarrays (106 primary tumors were previously used in a separate analysis of gene expression (GSE2150115; Stratford et al., 2010). To validate the findings, further RNA sequencing was performed on 15 primary tumors, 37 pancreatic cancer patient-derived xenografts (PDX), 3 cells lines, and 6 cancer associated fibroblast (CAF) lines derived from deidentified patients with pancreatic cancer. Histology of all available samples was reviewed by a single blinded pathologist (KEV). Table 7 summarizes the demographic and clinical characteristics of patients in our cohorts.

TABLE 7 Demographics and Univariate Cox Analysis Resected Univariate with Cox p- Microarray RNAseq RNAseq All Survival value Primary Primary PDX Race Caucasian 128 121 0.507 99 9 25 African- 23 18 0.333 10 3 8 American Other 8 7 0.821 5 0 3 Gender F 90 83 0.348 67 5 23 M 80 68 0.348 55 8 14 T T1 4 4 0.420 2 1 2 Stage T2 22 20 0.530 20 2 5 T3 131 122 0.743 91 9 28 T4 1 1 0.115 1 0 0 N N0 49 43 0.068 36 7 10 Stage N1 112 106 0.068 80 5 25 M M0 160 149 — 129 12 35 Stage M1 15 0 — 14 0 Adjuvant Yes 74 70 0.055 44 5 21 Therapy No 30 28 0.055 27 3 7 Differentiation Well 16 13 0.940 16 0 1 Moderate 49 47 0.398 49 1 3 Poor 34 31 0.407 34 1 2 PDX Graft Success 44 37 0.164 11 8 37 Graft Failure 18 12 0.164 9 3 0 Margin Positive 58 52 0.026 34 5 17 Negative 93 88 0.026 75 7 17 TOTAL 193 163 143 15 37

Example 2 NMF Distinguishes Normal and Tumor Compartments

A key obstacle in the analysis of gene expression data, particularly in PDAC, is the removal of confounding normal or stroma gene expression from local and distant organ sites. FIGS. 1A-1D shows example histology of samples with both tumor, normal, and stromal tissue. NMF was employed to identify gene expression which we attribute to normal pancreas, liver, lung, muscle, and immune tissues. Expression of exemplar genes from these factors, i.e., genes with distinctly large weights in a single column of G, as well as factor weights for the samples, i.e., rows of S, showed excellent agreement with known tissue labels (see FIG. 3B, FIG. 3C, and FIG. 5). Investigation of the exemplar genes from these factors further confirmed their role as confounding normal tissue. For example, using the Kolmogorov-Smirnov test, the top-weighted genes from the liver factor showed significant (p<10⁻¹⁰) enrichment in the MSigDB term SU_LIVER, and the highest weighted gene, fibrinogen beta (FGB), was specifically expressed in normal human liver tissue.

In addition to normal tissue from distant organs, two factors were identified that were exclusive to pancreas tissue, but were differentiated from each other by their respective gene lists. One factor described endocrine function including expression of glucagon and insulin (GCG and INS), while the other factor described exocrine function including expression of digestive enzyme genes such as pancreatic lipase, PNLIP. This unsupervised discovery of two molecularly distinct yet highly co-localized factors related to normal pancreatic function represented an important proof of concept in the use of NMF to identify novel features without pre-defined expression knowledge.

To validate the normal expression signatures disclosed herein, all available samples were reviewed by a single pathologist to independently assess the amount of tumor, normal, and stroma cellularity. It was determined that many factor weights were correlated or anti-correlated to tumor cellularity (FIG. 6). Among normal and metastatic liver samples, for example, tumor-specific basal-like factor weights were correlated with cellularity, whereas the normal-specific liver factor weight was inversely related to the tumor content of a sample (FIG. 3D). These findings support the hypothesis that factor weights obtained from NMF were quantitatively indicative of underlying sample composition.

Example 3 Identification of Stroma-Specific Subtypes

Stroma is particularly important in PDAC. According to pathology assessments, stroma varies, and comprises on average 48% of the primary tumor samples employed herein, with a standard deviation of 30%. The instant analysis identified two factors which described gene expression from the stroma, which were distinctly different from the normal factors shown in FIGS. 3A-3D. Consensus clustering on exemplar genes from these two stroma factors divided tumor samples into two stromal subtypes, which were classified as “normal” and “activated” (FIG. 4A). Patients with samples with an activated stroma subtype had worse median survival (15 months) and 60% 1-year survival, when compared to patients with a normal stroma subtype (median 24 months, 1-year survival 82%; FIG. 4B). Both were notably absent in PDAC cell lines (FIG. 4C), which exhibited a distinct mitotic expression signature associated with mitotic checkpoints and DNA replication (Table 8). Whitfield et al., 2002. The fact that cell lines do not express these stromal factors and many metastatic samples do express them at low levels suggested that these genes were not expressed by the tumor epithelium. To further validate the stromal origin of these gene expression signatures, 6 CAF lines were isolated from primary tumors (FIGS. 8A-8F), and it was determined that they robustly overexpressed the stromal signatures disclosed herein as compared to PDAC tumor cell lines which had no expression of the stromal signatures (FIG. 4C).

The vast majority of collagen gene expression was attributable to stromal compartments, with the lone exception being COL17A1, which was high in tumors. “Normal” stroma was characterized by relatively high expression of known markers for pancreatic stellate cells, smooth muscle actin, vimentin, and desmin, (ACTA2, VEIL and DES). Stellate cells have been shown to promote cancer cell survival in vitro (Froeling et al., 2011), but at the same time may restrain PDAC in mouse models (Özdemir et al., 2014; Rhim et al., 2014), or inhibit delivery of chemotherapy (Olive et al., 2009). In patients, the ratio of smooth muscle actin stained area to the collagen-stained area has been shown to be predictive of poor outcomes (Erkan et al., 2008). “Activated” stroma was characterized by a more diverse set of genes associated with macrophages, such as the integrin ITGAM, and the chemokine ligands CCL13 and CCL18. “Activated” stroma also expressed other genes which point to its role in tumor promotion, including the secreted protein SPARC, WNT family members WNT2, and WNT5A, gelatinase B (MMP9), and stromelysin 3 (WPM). The presence of fibroblast activation protein (FAP) in the activated stroma, which has previously been related to worse prognosis, suggested that an activated fibroblast state may be partially responsible for the poor outcomes for these patients (Cohen et al., 2008). This observation led to the hypothesis that the “normal” stroma factor may describe a “good” version of stroma and that “activated” stroma factor may describe the activated inflammatory stromal response that has been seen in previous studies to be responsible for disease progression (Hwang et al., 2008; Vonlaufen et al., 2008; Herrera et al., 2013). The multifactor analysis disclosed herein supported a complex, multi-gene model of stroma in PDAC, which may explain why single gene analysis has yielded mixed results.

Example 4 Identification of Tumor-Specific Subtypes

Independent of normal and stromal factors, it was determined that two tumor-specific factors define “classical” and “basal-like” subtypes of PDAC. When the presently disclosed samples were split into the two tumor subtypes (FIG. 7A), patients with basal-like subtype tumors had an overall worse median survival of 11 months and 44% 1-year survival compared to 19 months and 70% 1-year survival for those with classical subtype tumors (p=0.006, FIG. 7B). All cell lines assayed in this study (p<0.001), as well as a majority of metastatic samples (p=0.002), were classified as “basal-like”, suggesting that cell line models represent only one subset of PDAC. These subtypes as well as their prognostic and/or diagnostic value were independently validated within the recently published International Cancer Genome Consortium (ICGC) PDAC microarray data set (FIGS. 7C and 7D; Nones et al., 2014). Genes from the “basal-like” factor, including laminins and keratins, were also consistent with basal subtypes previously defined in bladder (Rubio-Viqueira et al., 2006; Alexandrov et al., 2013b; Isella et al., 2015) and breast (Stolze et al., 2015) cancers (FIGS. 7E-7H). Interestingly, genes from the “basal-like” subtype reproduced subtype calls (p<0.001) in breast cancer, had prognostic value in breast cancer samples (p<0.001) and reproduced previous subtype calls in bladder cancer (p<0.001). Given these promising results, a single-sample cross-platform classifier of basal-like subtype which was trained on the presently disclosed microarray was developed, TCGA bladder, and Perou breast cancer data, with a 93% cross validation accuracy, which was able to classify TCGA breast cancer data with 92% accuracy during external validation (FIG. 9).

Potential subtypes of PDAC have previously been described by Collisson et al., 2011. The published exemplar genes were employed for “exocrine-like”, “classical”, and “quasimesenchymal” subtypes to cluster normal pancreas, cell lines, and primary PDAC tumors from the presently disclosed cohort (FIG. 10A). The three previous classifications were also observed in the data presented herein, but none held prognostic power either by cluster label or by supervised classification with PAM (FIG. 10B; Ihle et al., 2012). Furthermore, inclusion of the Collisson et al. subtypes into a multivariate Cox regression with the proposed tumor subtypes described herein did not remove the predictive power of the presently disclosed subtyping (p=0.014). By cross-referencing the genes from Collisson et al.'s model with the NMF model disclosed herein, three key findings were observed. First, “exocrine-like” genes overlapped with genes from the exocrine pancreas factor (17/17). Tumors in this cluster had expression indistinguishable from adjacent normal samples from the presently disclosed data set. Second, Collisson et al.'s “classical” genes overlapped with the “classical” subtype genes disclosed herein (20/22), for which the naming convention “classical” was retained herein. Third, the gene set associated with “quasimesenchymal” subtype appeared to be a mixed collection of genes from the presently disclosed “basal-like” tumor (6/20) and stromal subtypes (6/20). Thus, the appearance of stromal factors in the Collisson et al. list of “quasimesenchymal” class genes may explain the apparent mesenchymal-like gene expression that was observed.

“Basal-like” and “classical” tumors were found within both “normal” and “activated” stroma subtypes (FIG. 11A). Differential prognosis among tumor and stroma subtypes was cumulative, as “classical” subtype tumors with “normal” stroma subtypes (n=24) had the lowest hazard ratio of 0.39 with and a 95% CI of [0.21, 0.73], while the “basal-like” subtype tumors with “activated” stroma subtypes (n=26) had the highest hazard ratio of 2.28 with a 95% CI of [1.34, 3.87] (FIG. 11B). In a multivariate Cox regression model, which included tumor subtypes, stromal subtypes, and clinical variables (gender, race, T stage, N stage, margin status, adjuvant therapy, histological grade, and age), both classifications were independently associated with survival (stroma subtypes: p=0.037, tumor subtypes: p=0.003).

Although basal-like subtype tumors have a worse prognosis, patients with basal-like subtype tumors showed a strong trend towards better response to adjuvant therapy (p=0.072; FIG. 11C). Among basal-like subtype patients, adjuvant therapy provided a hazard ratio of 0.38, (95% CI of [0.14, 1.09]), while in patients with classical subtype tumors, adjuvant therapy is associated with a hazard ratio of only 0.76 (95% CI [0.40, 1.43]). In the presently disclosed cohort, there was no association of most clinical variables (race, gender, T stage, N stage, differentiation, or tumor cellularity) with survival, although positive nodal status trended towards significance, and positive margin status was significantly associated with worse survival (Table 7). Table 8 shows two-way associations of all subtype calls with clinical and pathological information from the presently disclosed cohort of PDAC patients. No association of tumor or stroma subtype with standard clinical or pathological variables was found, with the notable exception of mucinous features.

TABLE 8 Summary of Associations with Clinical Covariates and Subtypes Tumor Subtype Fischer's Fischer's Basal- Exact Stroma Subtype Exact Covariate Classical like p-value Normal Activated p-value Race Caucasian 90 27 0.521 26 65 1 African- 13 2 3 7 American Gender F 64 19 0.849 17 43 1 M 50 16 15 36 T T2 16 6 0.590 5 14 1 Stage T3 87 25 25 59 N N0 35 9 0.532 11 22 0.649 Stage N1 72 25 21 54 Margin Positive 38 8 0.385 7 22 0.629 Negative 65 22 22 49 Adjuvant Yes 48 13 0.437 10 30 0.769 Therapy No 21 9 5 19 Differentiation Poor 23 11 0.479 11 18 0.203 Well 49 16 13 44 Extracellular Low Mucin 49 24 0.042 18 43 0.792 Mucin High Mucin 23 3 6 19 Stroma Normal 31 8 0.144 Activated 57 31

Example 5 Tumor-Specific Subtypes Found in Patient-Derived Xenografts

To assess the tumor or stromal specificity of the presently disclosed signatures, RNAseq was performed on a group of 37 PDX tumors. PDX tumors were composed of human tumor cells surrounded by mouse stroma (FIGS. 12A-12D; Isella et al., 2015). Genes from both of the presently disclosed tumor signatures were expressed as human transcripts, whereas genes from both of the presently disclosed stromal signatures were expressed as mouse transcripts (FIG. 4D, FIG. 13A). PDX RNAseq expression was found to divide PDX into both classical and basal-like groupings (FIG. 13B) while predominantly expressing an activated stromal signature (FIG. 4D). Additionally, while tumor-specific subtype was not predictive of graft success (FIG. 14A), patient tumors with an activated stroma subtype had significantly higher graft success rates than those with normal stroma subtype or low amounts of stroma (FIG. 14B; p=0.019). Basal-like subtype tumors also exhibited faster growth rates than classical tumors (p=0.032) as measured by the length of time that tumors took to grow to 200 mm³ (TT200; FIGS. 14C and 14D), a previously used metric for PDX growth (Rubio-Viqueira et al., 2006). Retrospective analysis of patients who had matched PDX tumors found that a shorter TT200 was associated with an unfavorable recurrence-free survival (p=0.035; FIG. 14E), suggesting that PDX tumor growth rate may reflect patient biology.

Both mouse and human-specific expression of the Collisson et al. genes were measured in the presently disclosed PDX models. It was determined that while genes from the “classical” subtype were expressed by human cells in PDX, “quasimesenchymal” transcripts were expressed by a mixture of human and mouse cells, and “exocrine-like” transcripts were infrequently expressed (FIG. 10C). This supported the hypothesis that while the “classical” subtype was a bona fide group, the “quasimesenchymal” subtype was partially driven by non-tumor contributions of stroma and the “exocrine-like” subtype by normal pancreas.

Example 6 KRAS Codon Mutations, Tumor-Specific Subtypes, and Race

Studies of KRAS codon mutations have demonstrated that different codon mutations may have differential functions (Ihle et al., 2012; Stolze et al., 2015) and in some clinical studies, have been shown to be associated with differential outcome. Because PDX tumors are enriched for human-specific tumor cells, KRAS codon mutations were evaluated in the presently disclosed PDX cohort using manually curated RNAseq data. While the overall frequency of KRAS codon mutations was similar to a recent study of PDAC (Witkiewicz et al., 2015), it was noted that the KRAS G12D mutation was significantly overrepresented in the presently disclosed basal-like subtype while G12V was isolated to the classical subtype (FIG. 14F; p=0.030). Furthermore, an overrepresentation of KRAS G12V mutations was found in African-Americans (FIG. 14G; p<0.001). In contrast to basal-like breast cancers, which occur most frequently in African-American women and have a worse prognosis (Carey et al., 2010), African-American patients in the presently disclosed cohort tended to have mainly classical subtype tumors (13 vs 2). Similar to other cancers, African-Americans had a worse prognosis after adjusting for tumor subtype (FIG. 11E; p=0.017). African-American patients with classical subtype tumors had a mean survival of 13 months compared to Caucasian patients with classical subtype tumors, who had a median survival of 19 months.

Example 7 Other Commonly Mutated Genes and Altered Pathways in PDAC

Previously, loss of SMAD4 has been shown to promote tumor growth (Bardeesy et al., 2006; Haeger et al., 2015). Similar to previous PDX studies of PDAC, loss of SMAD4 was also found to be associated with graft success in PDX models (Garrido-Laguna et al., 2011; see FIG. 14H, FIG. 15A-15G; p=0.044). Furthermore, in the presently disclosed PDX cohort, SMAD4 expression was significantly higher in classical compared to basal-like subtype PDX tumors (FIG. 14I; p=0.015), consistent with the observation that SMAD4 loss confers a more aggressive phenotype.

Using mutation, genomic subtype (Waddell et al., 2015), and gene expression (Nones et al., 2014) data from publically available ICGC data in which recapitulation of the presently disclosed subtypes and prognosis were shown, significantly mutated genes and pathways in PDAC were also evaluated, including ones recently identified through whole-exome sequencing of microdissected primary PDAC tumors (Jones et al., 2008; Biankin et al., 2012; Waddell et al., 2015; Witkiewicz et al., 2015). No significant associations between the presently disclosed expression subtypes and these mutationally altered pathways, i.e., TGFβ, RB, NOTCH, CTNNB1, SWI/SNF, and DNA repair, were found (FIG. 16). Furthermore, no overlap was found between the presently disclosed subtypes and recently identified genomic subtypes, or response to platinum therapy (Waddell et al., 2015). Consistent with this, a recent comprehensive study of somatic mutations in PDAC long-term survivors suggested that somatic mutations alone will not be sufficient to explain clinical outcome (Dal Molin et al., 2015).

Given the overlap of the presently disclosed classical subtype with that of Collisson et al., 2011, it was not surprising to find that the presently disclosed classical subtype was also enriched for genes associated with GATA6 overexpression (Zhang et al., 2008; FIG. 17A, FIG. 11A). GATA6 has been found to promote epithelial cell differentiation (Zhang et al., 2008; Zhong et al., 2015). More detailed histological markers of differentiation were evaluated in the presently disclosed samples, and it was found that samples with greater than 10% extracellular mucin, a marker of differentiation, comprised mostly of classical subtype tumors (88.5%, n=23) compared to only 11.5% (n=3) of basal-like subtype tumors (FIGS. 18A-18C, p=0.042; Table 9). Consistent with the increased presence of extracellular mucin, the presently disclosed classical subtype was enriched for genes upregulated in mucinous ovarian cancer (WAMAUNYOKOLI_OVARIAN CANCER_GRADES_1_2_UP; Wamunyokoli et al., 2006). Interestingly, the presently disclosed basal-like subtype was enriched for genes related to KRAS activation and STK11 loss in a lung cancer mouse model where STK11-deficient tumors demonstrated shorter latency and more frequent metastasis (Ji et al., 2007). One sample with STK11 inactivation was found in the ICGC data; this sample was a basal-like subtype (FIG. 16). Notably, the presently disclosed subtypes were not associated with other known signaling pathways in PDAC, including Fanconi anemia, DNA repair, chromatin remodeling, beta-catenin, RB, ARF, G1 (FIG. 11A). However, all of these pathways except for beta-catenin were considerably differentially expressed in cell lines compared to patient tumors, suggesting that gene expression in cell lines might be a deceptive representation of most tumors.

Example 8 Tumor-Specific Subtypes Suggested Low Intrapatient Heterogeneity Between Primary and Metastatic Lesions

It is likely that only a subset of genes are relevant to the question of intra- and inter-patient heterogeneity in PDAC. Many methods exist to pre-select genes for supervised analysis (Carey et al., 2010), but selection of the most differentially expressed genes is a common preprocessing step during unsupervised analysis (Bardeesy et al., 2006). When clustering matched samples of metastatic and primary lesions using the 50 most differentially expressed genes among all matched samples, samples separated primarily by organ site instead of by patient (FIGS. 19A and 19C). In contrast, when considering 25 top ranked exemplar genes each from the “basal-like” and “classical” factors, samples from the same patient clustered closer together, and were less dependent of organ site (FIGS. 19B and 19D).

This was further illustrated in a focused analysis of two patients (FIGS. 19A-19G), whose tumor samples appeared patient-specific when considering the presently disclosed tumor subtype gene list, but clustered by site when considering differentially expressed genes. Overall, it was found that the presently disclosed tumor subtype gene list showed higher similarity (mean Pearson's ρ=0.53) between all other samples from the same patient than did the differentially expressed gene list (ρ=0.32, t-test p≤0.001). Furthermore, the presently disclosed tumor subtype gene list produced much lower similarity among all other samples from the same organ site across different patients (ρ=0.04) than the differentially expressed gene list (ρ=0.34, p≤0.001). This observed similarity of tumor gene expression among tumors within the same patient suggested overall high inter-patient tumor heterogeneity and low heterogeneity between primary and metastatic sites. However, examples of intra-patient heterogeneity were not observed between metastatic sites. For example, lung metastases, even those from patients with “basal-like” tumors in other locations, clustered exclusively with the “classical” tumors, suggesting that some intra-patient heterogeneity may exist among metastatic sites, and supporting the previously reported divergent patterns of failure in PDAC (Haeger et al., 2015).

Discussion of Examples 1-8

The studies disclosed herein represent the largest investigation of primary and metastatic PDAC gene expression to date. NMF was used to identify novel prognostic and/or diagnostic subtypes of PDAC which may have been previously obscured by confounding normal and stromal tissue. The identification of normal-, tumor-, and stroma-specific gene expression signatures was supported by both their overlap with previously identified gene lists and their expression in appropriate tissue types. The presently disclosed tumor subtypes were further supported by their relationship to previously identified basal tumor subtypes in breast and bladder cancers and their prognostic and/or diagnostic relevance in external cohorts. The present findings of two different stroma subtypes may help explain the differential effects of stroma previously seen in preclinical models.

Tumor and stroma specific gene expression classified PDAC into four distinct subtypes with prognostic and/or diagnostic relevance. The orthogonal nature of tumor- and stroma-specific subtypes suggested an important interplay in patient tumors that will need to be taken into account as stroma and immune modulating therapies are studied. In the presently disclosed cohort, patients with basal-like tumors appeared to derive more benefit from adjuvant therapy. Whether basal-like and classical subtypes may be associated with response to specific therapies can be studied further as more effective therapies become available. One challenge will be defining preclinical model systems that recapitulate these subtypes as the presently disclosed results suggested that traditional cell lines are lacking in the classical subtype. Although it has been demonstrated that PDX models recapitulate tumor-specific subtypes, these models alone may not be sufficient due to either the lack of human stroma or overrepresentation of the activated stroma subtype in the tumors that are successfully grafted. Thus, more detailed characterization of genetically engineered mouse models of PDAC models can be employed to determine which models best reflect both our tumor- and stroma-specific subtypes.

Recent exome sequencing studies have confirmed commonly mutated genes in PDAC but have not uncovered mutations that clearly confer survival differences (Jones et al., 2008; Waddell et al., 2015; Witkiewicz et al., 2015). In fact, exome sequencing of a cohort of very long-term survivors of PDAC (Dal Molin et al., 2015) found no differences in somatic mutations to explain the improved biology of tumors from these rare patients compared to the majority of patients with PDAC, suggesting that examining somatic mutations alone may not be sufficient to understand the biological and clinical differences in PDAC tumors. Furthermore, exome sequencing studies and studies of microdissected samples are limited to the tumor compartment and overlook the stroma compartment which has been shown to be biologically critical in PDAC, with both tumor-promoting and tumor-inhibiting effects. The results provided herein suggested that RNA subtypes may better capture the molecular landscape of PDAC and its reflection on patient outcome. As such, the RNA subtypes disclosed herein may reflect the broad effect of somatic mutations while also capturing the importance of the neoplastic stroma.

These results provide new insight into the molecular composition of PDAC which may be used for precision medicine. Furthermore, knowledge of these subtypes and their prognostic and/or diagnostic value can provide decision support in a clinical setting where the choice and timing of therapies can be critical.

Example 9 Construction of a Cross-Platform Basal-Like Classifier

Having established a method for classifying cohorts of PDAC expression data into basal-like and classical samples, a more clinically applicable classification scheme that works on single samples was constructed. Such a single-sample classifier can be valuable in a clinical setting, where access to a large cohort of comparative cases is prohibitive. Furthermore, the ability of such a classifier to work across gene expression platforms and across relevant cancer types was assessed.

As such, a platform-independent classifier was developed and tested to discriminate between “basal-like” samples versus others across various cancers, given a sample's individual gene expression profile. Rank-based classifiers such as the Top Scoring Pair (TSP; Leek, 2009) and kTSP (Afsari et al., 2014) depend only on the relative ranks of the expression of genes within a sample, allowing such classifiers to be robust against platform-specific effects and study-to-study variations due to data normalization and preprocessing (Patil et al., 2015)

Briefly, the kTSP approach selects k pairs of genes A and B such that gene A expression>gene B expression implies sample membership to class 1, otherwise implying membership to class 2. The default decision rule in Afsari et al., 2015 following feature selection weights each TSP equally in their class prediction (“voting”), despite the fact that some TSPs may better discriminate between classes than others. The kTSP approach of Afsari et al., 2015 was extended as set forth herein by implementing a custom decision rule that inputs the selected k gene pairs into a penalized logistic regression classifier to estimate the relative contribution each of the k selected TSPs in predicting class membership (defined here as basal-like versus otherwise), similar to (Shi et al., 2011). In fitting the model, class membership was the binary outcome variable, and each covariate corresponded to a TSP, consisting of a binary integer vector which took on the value of 1 for a sample if gene A>gene B in expression for that TSP, and 0 otherwise for each sample.

A penalized logistic regression model was fit using the ncvreg package (Breheny & Huang, 2011) to account for potential correlation between TSPs (ridge penalty) and to remove TSPs unhelpful in prediction given the presence of other features in the model (MCP penalty). Given the fitted model and a new sample's expression profile, a predicted probability of basal-like class membership could be obtained.

To build the presently disclosed classifier to predict the basal-like class across various cancers, the presently disclosed classifier was trained on a “metadataset” consisting of the TCGA Bladder (RNA-seq, 20533 genes), UNC Pancreas (Microarray, 19749 genes), and Perou Breast Cancer (Microarray, 17631 genes) data sets, totaling 788 samples. Each data set was reduced to a common set of genes found across each study to the described 50 gene signature described herein. The Perou Breast Cancer data set was further filtered to remove genes that had missing values for more than 10 samples, leaving 11526 genes. The remaining missing data was imputed using the impute package (Hastie et al. impute: impute: Imputation for microarray data. R package version 1.42.0.) in R using default parameters. Only 29 of the 50 genes from the original gene signature remained for feature selection after filtering. Because of this small number, a larger 500 gene set encompassing the original 50 gene set, which was derived in a similar fashion, was utilized. From this larger gene set, 302 genes were found across all three training datasets.

Basal-like samples were identified in the TCGA bladder and Perou Breast Cancer data sets from their associated clinical annotation files, and in the UNC Pancreas data, the basal-like clustering calls from the present disclosure were utilized. Given the known classes (basal-like versus otherwise) and gene expression profiles in each data set, the presently disclosed feature selection was performed using the switchBox package (Afsari et al., 2015) to select the k TSPs from the 302 candidate genes, resulting in 16 TSPs being selected. The ncvreg function from (Breheny & Huang, 2011) was applied using the MCP penalty and an alpha parameter of 0.5, allowing for equal contribution of the ridge penalty to account for correlation between TSPs and the MCP penalty for feature selection. The appropriate penalty was chosen via leave-one-out cross validation using the cv.ncreg function (788 folds).

The final model described herein was found to contain 14 TSPs when derived from the larger 500 gene signature. The fitted estimates can be found in Table 9. Calculating the pair-wise spearman correlation between samples across the classifier's genes, it was determined that samples from the basal-like state (orange) tended to cluster together in terms of similarity (see FIG. 16). It was also determined that the predictions described herein tended to match the known classes for each sample regardless of platform or tumor type.

TABLE 9 Fitted Estimates for the Final Model Estimated Increase in Odds of Basal Class Gene A Gene B Coefficient Membership when A > B CD109 GPR160 0.87 2.38 SLC2A1 AGR2 1.22 3.39 KRT16 SLC44A4 0.52 1.68 CTSL2 TMEM45B 1.43 4.17 KRT6A BCAS1 0.70 2.01 B3GNT5 VSIG2 0.41 1.51 MET TFF3 0.72 2.06 CHST6 PLA2G10 0.80 2.24 SERPINB5 HPGD 0.76 2.13 DCBLD2 PLS1 1.40 4.07 IL20RB FAM3D 1.33 3.79 PPP1R14C SYTL2 1.58 4.85 NAB1 PLEKHA6 0.41 1.50 MSLN CAPN9 1.58 4.83 (Intercept) −7.16

To classify each sample, gene expression from pairs of genes in Table 9 were compared such that for each gene pair, if Gene A expression is greater than Gene B expression, the coefficient for that gene pair was added to a running sum. If the sum of all such coefficients and the intercept from Table 9 was greater than zero, the sample was classified as basal (see EQUATION 1).

To validate the 14 TSP classifier, the presently disclosed model was applied to two independent data sets: the TCGA Breast Cancer (RNAseq) data set and the ICGC pancreas cancer data sat (Microarray). It was determined that the predictions matched well in the independent TCGA data set, demonstrating a 92.3% classification accuracy. The only validation data set that did not have existing subtype calls is the ICGC pancreas data set. It was further determined that the presently disclosed TSP predictions did not match as well with the presently disclosed clustering results, with a match rate between clustering-based calls and classifier prediction of 85.5%. Finally, it was also determined that spearman correlation of gene expression as a whole was much worse between the ICGC platform and any of the various RNAseq or Agilent Microarray data described herein.

Accordingly, the present disclosure demonstrated excellent within-training set performance of the described classifier across multiple platforms, in addition to accurate prediction of the classifier in an independent RNAseq data set.

Extending the methodology described above, a stroma-specific (activated versus normal stroma; see EQUATION 2) and a tumor-specific (basal versus classical; see EQUATION 3) classifier was trained within only the pancreatic cancer data. Table 10 and Table 11 show the coefficients of the fitted model sufficient for classifying between activated and normal stroma subtypes, or between basal-like and classical subtypes, respectively.

TABLE 10 Fitted Estimates for the Pancreas-specific Stromal Model Estimated Increase in odds of activated stroma Gene A Gene B Coefficient class membership when A > B ITGA11 SCRG1 0.67 1.95 COL5A1 IGF1 1.25 3.48 COL11A1 ANGPTL7 3.23 25.37 MMP11 ACTG2 1.67 5.30 FNDC1 SYNM 1.43 4.18 ZNF469 MYH11 1.51 4.54 RBPMS2 RERGL 1.25 3.49 COL1A1 COL1A2 0.18 1.20 Intercept −6.17

TABLE 11 Fitted Estimates for the Pancreas-specific Tumor Subtype Model Estimated Increase in odds of basal class Gene A Gene B Coefficient membership when A > B GPR87 MS4A8B 1.084442 2.96 KRT6C BTNL8 2.622242 13.77 ANXA8L2 PLA2G10 2.73881 15.47 KRT6A KCNE3 1.891903 6.63 C16orf74 DDC 1.898285 6.67 SCEL MYO1A 2.161549 8.68 DCBLD2 PLS1 2.189532 8.93 FAM83A REG4 2.855056 17.38 PTGES ATP10B 1.674513 5.34 Intercept −9.255835

Example 10 Exemplary Clinical Approaches to Care

FIGS. 20 and 21 show exemplary, non-limiting clinical approaches to care based on tumor and stroma subtype determinations employing EQUATIONS 2 and 3 above.

In FIG. 20, exemplary treatment considerations for patients with a pancreatic mass that has no evidence of distant (metastatic) spread to other organs (i.e., tumor is confined to the pancreas) is presented. In this case, a patient undergoes a biopsy. If the stroma subtype is determined to be normal using EQUATION 2, the patient proceeds to surgery. However, as agents such as those listed in Table 3 become available or are developed against the genes in Table 3, a patient with normal stroma subtypes also considers neoadjuvant therapy using the Table 3 agents prior to surgery. If this patient is determined to have an activated stroma subtype, the patient considers radiation and other stroma modulation therapies noted in FIG. 20, including but not limited to hyaluronidase, hedgehog inhibition, modified vitamin D, vitamin D derivatives or compounds, anti-cytokine agents, or agents listed in or directed against the genes listed in Table 2. Additionally, the patient considers the therapies recommended based on tumor subtype (classical or basal-like) as described herein below.

If the biopsy shows classical subtype as determined using EQUATION 3, the patient is moved directly to surgery or prior to surgery, treatment with one or more agents listed in Table 5 or Table 6 or directed against the genes listed in Tables 5 and 6 is commenced. If the patient has a basal-like tumor, surgery alone would not be adequate. Therefore, this patient is recommended to undergo chemotherapy with the agents listed in FIG. 20 and/or with agents listed in Tables 4 and 6 and/or against the genes listed in Tables 4 and 6.

FIG. 21 shows exemplary treatment considerations for patients with a pancreatic mass that has evidence of distant (metastatic) spread to other organs. In this case, the patient would also undergo a biopsy. If the biopsy shows classical subtype as per EQUATION 3, the patient considers 5-fluorouracil or platinum based therapy. In some instances, other chemotherapies are also considered. However, as other agents such as those listed in Tables 5 and 6 become available or are developed against the genes listed in Tables 5 and 6, these therapies are considered in conjunction with the chemotherapy. If the patient has a basal-like tumor, cisplatin- or oxaliplatin-based therapies or gemcitabine as listed in FIG. 21 are considered. In some instances other chemotherapies are appropriate. In addition, the agents listed in Tables 4 and 6 or agents against the genes listed in Tables 4 and 6 are added to the chemotherapy.

If the patient has a normal stroma subtype as per EQUATION 2, no additional therapy besides those based on the tumor subtype is considered. However, immunotherapies to augment immune response can be considered. As additional agents such as those listed in Table 3 become available or are developed against the genes in Table 3, a patient with normal stroma subtypes considers using Table 3 agents in conjunction with the tumor subtype specific therapy regimen such as chemotherapy. For patients with activated stroma, radiation and other stroma modulation therapies listed in FIG. 21 are considered in conjunction with the tumor subtype specific therapy, including but not limited to hyaluronidase, hedgehog inhibition, modified vitamin D, vitamin D derivatives or compounds, and anti-cytokine agents. In addition, the agents listed in Table 2 and/or agents against the genes listed in Table 2 are also considered.

REFERENCES

The references listed below as well as all references cited in the specification including, but not limited to patents, patent application publications, journal articles, and database entries (e.g., GENBANK® biosequence database entries including all annotations and references cited therein) are incorporated herein by reference to the extent that they supplement, explain, provide a background for, or teach methodology, techniques, and/or compositions employed herein. With respect to GENBANK® biosequence database entries, if a sequence listed herein is or has been updated with a new sequence, it is understood that the instant disclosure also incorporates by reference to the sequence listed herein any such new sequences.

-   Afsari et al. (2014) Ann Appl Stat 8:1469-1491. -   Afsari et al. (2015) Bioinformatics 31:273-274 -   Ahmad et al. (2001)Am J Gastroenterol 96:2609-2615. -   Albert et al. (1992) J Virol 66:5627-5630. -   Alexandrov eta. (2013a) Cell Rep 3:246-259. -   Alexandrov et al. (2013b) Nature 500:415-421. -   Alexay et al. (1996) Proc SPIE 2705, Fluorescence Detection IV,     6363. -   Ausubel et al. (2002) Short Protocols in Molecular Biology, Fifth     ed. Wiley, New York, N.Y., United States of America. -   Ausubel et al. (2003) Current Protocols in Molecular Biology, John     Wylie & Sons, Inc., New York, N.Y., United States of America. -   Bachem et al. (2005) Gastroenterol 128:907-921. -   Bardeesy et al. (2006) Genes Dev 20:3130-3146. -   Bej et al. (1991) Appl Environ Microbiol 57:3529-3534. -   Biankin et al. (2012) Nature 491:399-405. -   Biton et al. (2014) Cell Rep 9:1235-1245. -   Boom et al. (1990) J Clin Microbiol 28:495-503. -   Boyle & Levin (2008) World Cancer Report 2008. Lyon, International     Agency for Research on Cancer. -   Breheny & Huang (2011) Ann Appl Stat 5:232-253. -   Buffone et al. (1991) Clin Chem 37:1945-1949. -   Busch et al. (1992) Transfusion 32:420-425. -   Cancer Genome Atlas Research Network, The (2011) Nature 474:609-615. -   Cancer Genome Atlas Research Network, The (2012a) Nature     487:330-337. -   Cancer Genome Atlas Research Network, The (2012b) Nature     489:519-525. -   Cancer Genome Atlas Research Network, The (2012c) Nature 490:61-70. -   Cancer Genome Atlas Research Network, The (2013a) Nature 497:67-73. -   Cancer Genome Atlas Research Network, The (2013b) Nature 499:43-49. -   Cancer Genome Atlas Research Network, The (2013c) New Eng J Med     368:2059. -   Cancer Genome Atlas Research Network, The (2014a) Nature     507:315-322. -   Cancer Genome Atlas Research Network, The (2014b) Nature     511:543-550. -   Carey et al. (2010) Nat Rev Clin Oncol 7:683-692. -   Carter et al. (2012) Nat Biotechnol 30:413-421. -   Cha & Thilly (1993) PCR Methods Appl 3:S18-S29. -   Cleary et al. (2004) J Am Coll Surg 198:722-731. -   Cohen et al. (2008) Pancreas 37:154-158. -   Cohen et al. (2008) Pancreas 37:154-158. -   Collisson et al. (2011) Nat Med 17:500-503. -   Conlon et al. (1996) Ann Surg 223:273-279. -   Conroy et al. (2011) New Eng J Med 364:1817-1825. -   Conway et al. (2012) Bioinformatics 28:i172-i178. -   Cousins et al. (1992) J Clin Microbiol 30:255-258. -   Crnogorac-Jurcevic et al. (2002) Oncogene 21:4587-4594. -   Dal Molin et al. (2015) Clin Cancer Res 21:1944-1950. -   Damrauer et al. (2014) Proc Nat Acad Sci USA 111:3110-3115. -   DeOliveira et al. (2006) Annals Surg 244:931-937. -   DeRisi et al. (1996) Nat Genet 14:457-460. -   Dubiley et al. (1997) Nucl Acids Res 25:2259-2265. -   Duda et al. (2012) Pattern Classification. John Wiley & Sons, New     York, N.Y., United States of America. -   Eisenberg & Levanon (2003) Trends Genet 19:362-365. -   Englert (2000) in Schena, ed., Microarray Biochip Technology, pp.     231-246, Eaton Publishing, Natick, Mass., United States of America. -   Eppsteiner et al. (2009) Annals Surg 249:635-640. -   Erkan et al. (2008) Clin Gastroenterol Hepatol 6:1155-1161. -   Espejo et al. (2002) Biochem J 367:697-702. -   Fang et al. (2002) Chembiochem 3:987-991. -   Ferrone et al. (2008) J Gastrointest Surg 12:701-706. -   Fodor et al. (1991) Science 251:767-773. -   Fodor et al. (1993) Nature 364:555-556. -   Froeling et al. (2011) Gastroenterol 141:1486-1497. -   Garrido-Laguna et al. (2011) Clin Cancer Res 17:5793-5800. -   Gress et al. (2001) Annals Internal Med 134:459-464. -   Guedon et al. (2000) Anal Chem 72(24):6003-6009. -   Haab et al. (2001) Genome Biol 2:RESEARCH0004. -   Haeger et al. (2015) Oncogene Apr. 20. doi: 10.1038/onc.2015.112.     [Epub ahead of print]. -   Hamel et al. (1995) J Clin Microbiol 33:287-291. -   Han et al. (2006) Pancreas 32:271-275. -   Heaton et al. (2001) Proc Natl Acad Sci USA 98(7):3701-3704. -   Hermanson (1990) Bioconjugate Techniques, Academic Press, San Diego,     Calif., United States of America. -   Herrera et al. (2013) Clin Cancer Res 19:5914-5926. -   Herrewegh et al. (1995) J Clin Microbiol 33:684-689. -   Hoadley et al. (2014) Cell 158:929-944. -   Houseman et al. (2002) Nat Biotechnol 20:270-274. -   Hwang et al. (2008) Cancer Res 68:918-926. -   Iacobuzio-Donahue et al. (2003) Am J Pathol 162:1151-1162. -   Iacobuzio-Donahue et al. (2009) J Clin Oncol 27:1806-1813. -   Ihle et al. (2012) J Natl Cancer Inst 104:228-239. -   Isella et al. (2015) Nat Genet 47:312-319. -   Izraeli et al. (1991) Nucl Acids Res 19:6051. -   Ji et al. (2007) Nature 448:807-810. -   Jones et al. (2008) Science 321:1801-1806. -   Kim et al. (2013) Genome Biol 14:R36. -   Kohsaka & Carson (1994) J Clin Lab Anal 8:452-455. -   Krapp et al. (1998) Genes Dev 12:3752-3763. -   Lanciotti et al. (1992) J Clin Microbiol 30:545-551. -   Leek (2009) Bioinformatics 25:1203-1204. -   Linz et al. (1990)J Clin Chem Clin Biochem 28:5-13. -   Lisle et al. (2001) BioTechniques 30:1268-1272. -   Liu & Hlady (1996) Colloids Surfaces B Biointerfaces 8:25-37. -   Lockhart et al. (1996) Nat Biotechnol 14:1675-1680. -   Logsdon et al. (2003) Cancer Res 63:2649-2657. -   Louvet et al. (2005) J Clin Oncol 23:3509-3516. -   MacBeath & Schreiber (2000) Science 289:1760-1763. -   Mace et al. (2000) in Schena, ed., Microarray Biochip Technology,     pp. 39-64, Eaton Publishing, Natick, Mass., United States of     America. -   Maier et al. (1994) J Biotechnol 35:191-203. -   McCaustland et al. (1991) J Virol Methods 35:331-342. -   McConkey et al. (2014) Eur Urol 66:609-910. -   McGall et al. (1996) Proc Nat Acad Sci USA 93:13555-13460. -   McLendon et al. (2008) Nature 455:1061-1068. -   McPherson et al. (1995) PCR 2: A Practical Approach, IRL Press, New     York, N.Y., United States of America. -   Millar et al. (1995) Anal Biochem 226:325-330. -   Natarajan et al. (1994) PCR Methods Appl 3:346-350. -   Neel et al. (2014) Mol Cancer Ther 13:122-133. -   Nelson et al. (2001) Anal Chem 73(1):1-7. -   Neuhaus et al. (2008) J Clin Oncol May 20 Suppl; Abstr LBA4504. -   Nones et al. (2014) Int J Cancer 135:1110-1118. -   O'Donnell et al. (1997) Anal Chem 69:2438-2443. -   Olive et al. (2009) Science 324:1457-1461. -   Özdemir et al. (2014) Cancer Cell 25:719-734. -   Paladichuk (1999) The Scientist 13:20-23. -   Parker et al. (2009) J Clin Oncol 27:1160-1167. -   Parkin et al. (2005) CA Cancer J Clin 55:74-108. -   Patil et al. (2015) Bioinformatics btv157 [Epub ahead of print]. -   PCT International Patent Application Publication Nos. WO     1993/009668; WO 1995/011755; WO 1997/014028; WO 1999/019515; WO     1999/032660; WO 1999/032660; WO 1999/063385; WO 2001/013120; WO     2001/014589; WO 2001/023082; WO 2004/046098; WO 2004/110244; WO     2006/089268; WO 2007/001324; WO 2007/056332; WO 2007/07025. -   Piétu et al. (1996) Genome Res 6:492-503. -   Prat et al. (2010) Breast Cancer Res 12:R68. -   Randolph & Waggoner (1995) Nucl Acids Res 25:2923-2929. -   Ratner & Castner (1997) in Vickerman, ed., Surface Analysis: The     Principal Techniques, John Wiley & Sons, New York, N.Y., United     States of America. -   Rhim et al. (2014) Cancer Cell 16:735-747. -   Robertson & Walsh-Weller (1998) Methods Mol Biol 98:121-154. -   Rose (2000) in Schena, ed., Microarray Biochip Technology, pp.     19-38, Eaton Publishing, Natick, Mass., United States of America. -   Roux (1995) PCR Methods Appl 4:S185-S194. -   Rubio-Viqueira et al. (2006) Clin Cancer Res 12:4652-4661. -   Rupp et al. (1988) BioTechniques 6:56-60. -   Salisbury et al. (2002) J Am Chem Soc 124:14868-14870. -   Sambrook & Russell (2001) Molecular Cloning: A Laboratory Manual,     3^(rd). Edition, Cold Spring Harbor Press, Cold Spring Harbor, N.Y.,     United States of America. -   Sapolsky & Lipshutz (1996) Genomics 33:445-456. -   Schena et al. (1995) Science 270:467-470. -   Schena et al. (1996) Proc Natl Acad Sci USA 93:10614-10619. -   Schnelldorfer et al. (2008) Ann Surg 247:456-462. -   Seong (2002) Clin Diagn Lab Immunol 9:927-930. -   Shalon et al. (1996) Genome Res 6:639-645. -   Shi et al. (2010) Nat Biotechnol 28:827-838. -   Shi et al. (2011) Bmc Bioinformatics 12:375. -   Shoemaker et al. (1996) Nat Genet 14:450-456. -   Shriver-Lake (1998) in Cass & Ligler, eds., Immobilized Biomolecules     in Analysis, pp. 1-14, Oxford Press, Oxford, United Kingdom. -   Silhavy et al. (1984) Experiments with Gene Fusions, Cold Spring     Harbor Laboratory, Cold Spring Harbor, N.Y., United States of     America. -   Smith (1998a) The Scientist 12(14):21-24. -   Smith et al. (1998b) Clin Chem 44(9):2054-2056. -   Southern (1975) J Mol Biol 98:503-517. -   Stolze et al. (2015) Sci Rep 5:8535. -   Strain & Chmielewski (2001) BioTechniques 30(6):1286-1291. -   Stratford et al. (2010) PLoS Med 7:e1000307. -   Stuart et al. (2004) Proc Nat Acad Sci USA 101:615-620. -   Subramanian et al. (2005) Proc Nat Acad Sci USA 102:15545-15550. -   Tanaka et al. (1994) J Gen Virol 75:2691-2698. -   Theriault et al. (1999) in Schena, ed., DNA Microarrays: A Practical     Approach, pp. 101-120, Oxford University Press Inc., New York, N.Y.,     United States of America. -   Tibshirani et al. (2002) Proc Nat Acad Sci USA 99:6567-6572. -   Tijssen (ed.) (1993) Laboratory Techniques in Biochemistry and     Molecular Biology: Hybridization With Nucleic Acid Probes, Part I:     Part I. Theory and Nucleic Acid Preparation, Elsevier Press, New     York, N.Y., United States of America. -   Trapnell et al. (2012) Nature Protoc 7:562-578. -   Tusher et al. (2001) Proc Natl Acad Sci USA 98:5116-5121. -   U.S. Pat. Nos. 4,729,947; 5,143,854; 5,207,880; 5,230,781;     5,346,603; 5,360,523; 5,534,125; 5,571,388; 5,743,960; 5,800,992;     5,837,832; 5,843,767; 5,846,717; 5,871,918; 5,916,524; 5,965,352;     5,968,745; 5,974,164; 5,985,557; 5,994,069; 6,001,567; 6,017,696;     6,066,457; 6,086,737; 6,090,543; 6,123,819; 6,127,127; 6,162,603;     6,185,561; 6,225,059; 6,229,911; 6,245,508. -   Van Kerckhoven et al. (1994) J Clin Microbiol 32:1669-1673. -   Vignali (2000) J Immunol Methods 243(1-2):243-255. -   Von Hoff et al. (2013) N Engl J Med 369:1691-703. -   Vonlaufen et al. (2008) Cancer Res 68:2085-2093. -   Waddell et al. (2015) Nature 518:495-501. -   Wamunyokoli et al. (2006) Clin Cancer Res 12:690-700. -   Wang et al. (1989) Proc Natl Acad Sci USA 86:9717-9721. -   Wang et al. (2010) Cancer Res 70:6448-6455. -   Whitfield et al. (2002)Mol Biol Cell 13:1977-2000. -   Williams (1989) BioTechniques 7:762-769. -   Williams et al. (1990) Nucl Acids Res 18(22):6531-6535. -   Winter et al. (2006) J Gastrointest Surg 10:1199-1210; discussion     1210-1211. -   Witkiewicz et al. (2015) Nature Commun 6:6744. -   Worley et al. (2000) in Schena, ed., Microarray Biochip Technology,     pp. 65-86, Eaton Publishing, Natick, Mass., United States of     America. -   Yachida et al. (2010) Nature 467:1114-1117. -   Yang et al. (1998) Science 282:2244-2246. -   Yermilov et al. (2009) Annals Surg Oncol 16:554-561. -   Yershov et al. (1996) Proc Natl Acad Sci USA 93:4913-4918. -   Yoshihara et al. (2013) Nat Commun 4:2612. -   Zhang et al. (2008) Nat Genet 40:862-870. -   Zhong et al. (2015) PLoS One 6:e22129. -   Zhu et al. (2001) Science 293:2101-2105.

It will be understood that various details of the presently disclosed subject matter may be changed without departing from the scope of the presently disclosed subject matter. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation. 

1.-14. (canceled)
 15. A method of assaying a biological sample obtained from a subject comprising measuring a nucleic acid expression level of gene A and gene B for a plurality of gene pairs selected from the group consisting of Table 9, 10 and 11 in the biological sample obtained from the subject, wherein the subject has been diagnosed a cancer.
 16. The method of claim 15, wherein the cancer is selected from the group consisting of pancreatic cancer, breast cancer and bladder cancer and the biological samples is obtained from the pancreas, the breast or the bladder, respectively.
 17. The method of claim 16, wherein the plurality of gene pairs are selected from Table
 9. 18. The method of claim 17, wherein the plurality of gene pairs selected from Table 9 comprises all of the gene pairs from Table
 9. 19. The method of claim 16, wherein the cancer is pancreatic cancer and the biological samples is obtained from the pancreas.
 20. The method of claim 19, wherein the plurality of gene pairs are selected from Table 10 and
 11. 21. The method of claim 20, wherein the plurality of gene pairs selected from Table 10 comprises all of the gene pairs from Table
 10. 22. The method of claim 20, wherein the plurality of gene pairs selected from Table 11 comprises all of the gene pairs from Table
 11. 23. The method of claim 15, wherein the subject is a human.
 24. A method for treating cancer in a subject diagnosed with cancer, the method comprising: (a) measuring a nucleic acid expression level of gene A and gene B for a plurality of gene pairs selected from Table 9 in a biological sample obtained from the subject, wherein the subject has been diagnosed with either breast cancer, pancreatic cancer or bladder cancer; (b) classifying the subject as having a basal subtype of the cancer based on the nucleic acid expression levels of gene A and gene B in each gene pair from the plurality of gene pairs selected from Table 9, wherein the classifying comprises calculating a value d using EQUATION 1, $\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Basal}\mspace{14mu}{if}\mspace{14mu} d} > 0} \\ {{{Not}\mspace{14mu}{Basal}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 1} \end{matrix}$ wherein A_(i) and B_(i) are measured expression levels of each Gene A and each Gene B of Table 9 in the i^(th) row, respectively, C_(i) is the i^(th) coefficient, and I is the intercept, and further wherein if d is greater than 0, the subject is classified as having a basal subtype, and if d is less than or equal to 0, the subject is classified as having a not basal subtype; and (c) administering a treatment for the subject based on the subject being classified as having a basal subtype, wherein the treatment is selected from agents for treating the basal subtype listed in FIG. 20, agents for treating the basal subtype listed in FIG. 21, agents listed in Table 4, agents listed in Table 6 and combinations thereof.
 25. The method of claim 24, wherein the plurality of gene pairs selected from Table 9 comprises all of the gene pairs from Table
 9. 26. A method for treating pancreatic cancer in a subject diagnosed with pancreatic cancer, the method comprising: (a) measuring a nucleic acid expression level of gene A and gene B for a plurality of gene pairs selected from Table 10 or Table 11 in a biological sample comprising pancreatic cells obtained from the subject; (b) classifying the subject as having a normal stroma subtype of pancreatic cancer or an activated stroma subtype of pancreatic cancer based on the nucleic acid expression levels of gene A and gene B in each gene pair from the plurality of gene pairs selected from Table 10, wherein the classifying comprises calculating a value d using EQUATION 2, $\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Activated}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{11mu} d} > 0} \\ {{{Normal}\mspace{14mu}{Stroma}\mspace{14mu}{if}\mspace{11mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 2} \end{matrix}$ wherein A_(i) and B_(i) are measured expression levels of each Gene A and each Gene B of the plurality of gene pairs selected from Table 10 in the i^(th) row, respectively, C_(i) is the i^(th) coefficient, and I is the intercept, and further wherein if d is greater than 0, the subject is classified as having an activated stroma subtype, and if d is less than or equal to 0, the subject is classified as having a normal stroma subtype OR classifying the subject as having a basal-like subtype of pancreatic cancer or a classical subtype of pancreatic cancer based on the nucleic acid expression levels of gene A and gene B in each gene pair from the plurality of gene pairs selected from Table 11, wherein the classifying comprises calculating a value d using EQUATION 3, $\begin{matrix} {P_{i} = \left\{ {{\begin{matrix} {{1\mspace{14mu}{if}\mspace{14mu} A_{i}} > B_{i}} \\ {{0\mspace{14mu}{if}\mspace{14mu} B_{i}} \geq A_{i}} \end{matrix}d} = {{I + {\sum\limits_{i}^{\;}{P_{i}C_{i}{decision}}}} = \left\{ \begin{matrix} {{{Basal} - {{like}\mspace{14mu}{if}\mspace{14mu} d}} > 0} \\ {{{Classical}\mspace{14mu}{if}\mspace{14mu} d} \leq 0} \end{matrix} \right.}} \right.} & {{EQUATION}\mspace{14mu} 3} \end{matrix}$ wherein A_(i) and B_(i) are measured expression levels of each Gene A and each Gene B of the plurality of gene pairs selected from Table 11 in the i^(th) row, respectively, C_(i) is the i^(th) coefficient, and I is the intercept, and further wherein if d is greater than 0, the subject is classified as having a basal-like subtype, and if d is less than or equal to 0, the subject is classified as having a classical subtype; and (c) administering a treatment for the subject based on the subject being classified as having a normal stroma subtype, an activated stroma subtype, a basal-like subtype or a classical subtype, wherein the treatment for the normal stroma subtype is surgery alone or surgery prior to treatment with agents selected from the agents listed in Table 3, wherein the treatment for the activated stroma subtype is selected from the groups consisting of radiation, stroma modulation therapies noted in FIG. 20 and agents listed in or directed against the genes listed in Table 2, wherein the treatment for the basal-like subtype is cisplatin, oxaliplatin-based therapies, gemcitabine, chemotherapy with the agents listed in FIG. 20 and/or with agents listed in Tables 4 and 6 and/or against the genes listed in Tables 4 and 6, and wherein the treatment for the classical subtype is 5-fluorouracil, platinum-based therapy, surgery or prior to surgery, treatment with one or more agents listed in Table 5 or Table 6 or directed against the genes listed in Tables 5 and
 6. 27. The method of claim 26, wherein the plurality of gene pairs selected from Table 10 comprises all of the gene pairs from Table
 10. 28. The method of claim 26, wherein the plurality of gene pairs selected from Table 11 comprises all of the gene pairs from Table
 11. 